记录一下 Win11 下自编译 Ollama 本地运行 llama3.1

标签：github Ollama llama3.1 server com Win11 debug GIN ollama

运行环境

Windows 11（显卡 AMD Radeon RX 6650 XT）
VS Code（用于查找特定代码，在 gfx1030 附近添加 gfx1032）
Git

Go 版本

$ go version
go version go1.23.3 windows/amd64

MinGW (编译需要 make 命令)

$ make -v
GNU Make 4.4.1
Built for x86_64-w64-mingw32
Copyright (C) 1988-2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

注：将 MinGW 放在环境变量中后如果使用 make -v 报错，到 mingw64\bin 路径下复制一份 mingw32-make.exe 改名为 make.exe 即可（安装 Purl 同理）。

安装 AMD HIP SDK for Windows

下载地址：HIP SDK 6.1.2
安装成功后，将 HIP SDK 添加到环境变量中（如：%HIP_PATH_61%\bin）

运行 hipinfo，可以看到 AMD Radeon RX 6650 XT 对应的 gcnArchName 为：gfx1032

$ hipinfo

--------------------------------------------------------------------------------
device#                           0
Name:                             AMD Radeon RX 6650 XT
pciBusID:                         3
pciDeviceID:                      0
pciDomainID:                      0
multiProcessorCount:              16
maxThreadsPerMultiProcessor:      2048
isMultiGpuBoard:                  0
clockRate:                        2410 Mhz
memoryClockRate:                  1095 Mhz
memoryBusWidth:                   0
totalGlobalMem:                   7.98 GB
totalConstMem:                    2147483647
sharedMemPerBlock:                64.00 KB
canMapHostMemory:                 1
regsPerBlock:                     0
warpSize:                         32
l2CacheSize:                      4194304
computeMode:                      0
maxThreadsPerBlock:               1024
maxThreadsDim.x:                  1024
maxThreadsDim.y:                  1024
maxThreadsDim.z:                  1024
maxGridSize.x:                    2147483647
maxGridSize.y:                    65536
maxGridSize.z:                    65536
major:                            10
minor:                            3
concurrentKernels:                1
cooperativeLaunch:                0
cooperativeMultiDeviceLaunch:     0
isIntegrated:                     0
maxTexture1D:                     16384
maxTexture2D.width:               16384
maxTexture2D.height:              16384
maxTexture3D.width:               2048
maxTexture3D.height:              2048
maxTexture3D.depth:               2048
hostNativeAtomicSupported:        1
isLargeBar:                       0
asicRevision:                     0
maxSharedMemoryPerMultiProcessor: 64.00 KB
clockInstructionRate:             1000.00 Mhz
arch.hasGlobalInt32Atomics:       1
arch.hasGlobalFloatAtomicExch:    1
arch.hasSharedInt32Atomics:       1
arch.hasSharedFloatAtomicExch:    1
arch.hasFloatAtomicAdd:           1
arch.hasGlobalInt64Atomics:       1
arch.hasSharedInt64Atomics:       1
arch.hasDoubles:                  1
arch.hasWarpVote:                 1
arch.hasWarpBallot:               1
arch.hasWarpShuffle:              1
arch.hasFunnelShift:              0
arch.hasThreadFenceSystem:        1
arch.hasSyncThreadsExt:           0
arch.hasSurfaceFuncs:             0
arch.has3dGrid:                   1
arch.hasDynamicParallelism:       0
gcnArchName:                      gfx1032
peers:
non-peers:                        device#0

memInfo.total:                    7.98 GB
memInfo.free:                     7.85 GB (98%)

可以看到官方 AMD ROCm 支持的 GPU并不包含 AMD Radeon RX 6650 XT，但是我们可以使用一些预构建的 rocblas 库
在 ROCmLibs for HIP SDK 6.1.2 中找到 rocm.gfx1032.for.hip.sdk.6.1.2.optimized.Fremont.Dango.Version.7z 并下载（这个版本较新，所以使用的这一个）
解压上述压缩包后（以下文件做好备份，出现问题后还可以回滚 ovo）
1. 将 rocblas.dll 文件复制到 C:\Program Files\AMD\ROCm\6.1\bin 下
2. 将 library 目录复制到 C:\Program Files\AMD\ROCm\6.1\bin\rocblas（选择替换所有文件）

编译 Ollama

克隆 Ollama 项目

# 注：当前实验版本为 ollama 0.4.0
git clone https://github.com/ollama/ollama.git

使用 VSCode 打开 ollama 代码，在 ollama/llama/make/Makefile.rocm 文件中添加 gfx1032 （直接在代码中全局查找 gfx1030 也可以找到对应文件）

# 原代码
HIP_ARCHS_COMMON := gfx900 gfx940 gfx941 gfx942 gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102

# 添加 gfx1032 使编译后的 ollama_llama_server.exe 支持 AMD Radeon RX 6650 XT
HIP_ARCHS_COMMON := gfx900 gfx940 gfx941 gfx942 gfx1010 gfx1012 gfx1030 gfx1032 gfx1100 gfx1101 gfx1102

依次运行以下命令
```
$ CGO_ENABLED="1"
```
```
$ go generate ./...
```
```
$ go build .
```
注：在克隆的 ollama 根路径下运行命令（使用 git bash 命令行，所以命令前有一个 $，复制命令时注意删除 $）

编译完成后，在 ollama 根路径下会生成一个 ollama.exe 文件，此时运行服务测试一下

$ ./ollama.exe serve
2024/11/08 21:38:18 routes.go:1189: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\cphovo\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-11-08T21:38:18.488+08:00 level=INFO source=images.go:755 msg="total blobs: 5"
time=2024-11-08T21:38:18.488+08:00 level=INFO source=images.go:762 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2024-11-08T21:38:18.490+08:00 level=INFO source=routes.go:1240 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-11-08T21:38:18.491+08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm]"
time=2024-11-08T21:38:18.491+08:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-11-08T21:38:18.492+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2024-11-08T21:38:18.492+08:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1
time=2024-11-08T21:38:18.492+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=12 efficiency=4 threads=20
time=2024-11-08T21:38:18.937+08:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=rocm variant="" compute=gfx1032 driver=6.1 name="AMD Radeon RX 6650 XT" total="8.0 GiB" available="7.8 GiB"

注：运行日志中出现类似 time=2024-11-08T21:38:18.937+08:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=rocm variant="" compute=gfx1032 driver=6.1 name="AMD Radeon RX 6650 XT" total="8.0 GiB" available="7.8 GiB" 的日志，说明编译的 ollama 已经支持使用 AMD Radeon RX 6650 XT 显卡加速。

下载模型并运行
```
$ ./ollama.exe run llama3.1
```
此时，可能会出现以下报错：由于找不到 ggml_rocm.dll，无法继续执行代码。重新安装程序可能会解决此问题。

但是我们在 dist\windows-amd64\lib\ollama 路径下可以看到是有 ggml_rocm.dll 文件的。

解决方法：将 dist\windows-amd64\lib\ollama\ggml_rocm.dll 文件复制一份，放到 dist\windows-amd64\lib\ollama\runners\rocm 下
```
$ cd dist/windows-amd64/lib/ollama/runners/rocm/

$ ls -al
total 349176
drwxr-xr-x 1 cphovo 197121         0 11月  8 21:45 .
drwxr-xr-x 1 cphovo 197121         0 11月  8 20:36 ..
-rwxr-xr-x 1 cphovo 197121 348145152 11月  8 20:34 ggml_rocm.dll
-rwxr-xr-x 1 cphovo 197121   9406976 11月  8 20:36 ollama_llama_server.exe
```
再次运行 ./ollama.exe run llama3.1 命令，看到以下内容（第一次运行会下载相关模型）：
```
$ ./ollama.exe run llama3.1
>>> Send a message (/? for help)
```

测试

$ ./ollama.exe run llama3.1
>>> 请使用 python 实现二分查找，仅给出代码即可
```python
def binary_search(arr, low, high, x):
    if high >= low:
        mid = (high + low) // 2
        if arr[mid] == x:
            return mid
        elif arr[mid] > x:
            return binary_search(arr, low, mid - 1, x)
        else:
            return binary_search(arr, mid + 1, high, x)
    else:
        return -1

arr = [2, 3, 4, 10, 40]
x = 10
result = binary_search(arr, 0, len(arr)-1, x)

if result != -1:
    print("Element is present at index", str(result))
else:
    print("Element is not present in array")
```

此时可以从任务管理器中看到 GPU 被正确使用，而不是通过 CPU 来跑的 llama3.1 模型，速度相比于使用 CPU 来说，快了很多倍。

问题记录

本来电脑上安装的 HIP SDK 版本是 5.7，但是使用相同步骤以后启动 ollama 服务，发现依旧使用的是 CPU 进行处理，后卸载 5.7 版本并安装 6.1 版本的 HIP SDK 后，实验成功
至于为什么会出现这个问题："由于找不到 ggml_rocm.dll，无法继续执行代码。重新安装程序可能会解决此问题。" ，我在原项目的 issue 中没有找到相关说明，但是在 B 站一些视频中下载的 ollama_orcm 文件中发现 ollama_llama_server.exe 所在目录中存在一个 llama.dll 文件，所以我就尝试将编译后的 ggml_rocm.dll 复制了一份放到了 ollama_llama_server.exe 所在目录下，很玄学，发现问题解决了（避免了我去提 issue，开心 ovo）
参考的 wiki 中说明编译的时候需要安装 Strawberry Perl，但是实际上我的电脑上只在运行 go generate ./... 命令时出现缺少 make 命令，我将 mingw64 中的 mingw32-make.exe 改名为 make.exe 后编译成功，所以不确定 Perl 是否确实需要
选用的大模型最好不要超过 GPU 显存

记录一下 Win11 下自编译 Ollama 本地运行 llama3.1

运行环境

安装 AMD HIP SDK for Windows

编译 Ollama

问题记录

参考&感谢
标签：github,Ollama,llama3.1,server,com,Win11,debug,GIN,ollama
From： https://www.cnblogs.com/cphovo/p/18536054

相关文章

赞助商

阅读排行

记录一下 Win11 下自编译 Ollama 本地运行 llama3.1

运行环境

安装 AMD HIP SDK for Windows

编译 Ollama

问题记录

参考&感谢 标签：github,Ollama,llama3.1,server,com,Win11,debug,GIN,ollama From： https://www.cnblogs.com/cphovo/p/18536054

相关文章

赞助商

阅读排行

参考&感谢
标签：github,Ollama,llama3.1,server,com,Win11,debug,GIN,ollama
From： https://www.cnblogs.com/cphovo/p/18536054