Ollama是一个用于在本地运行大型语言模型(LLM)的开源框架。它支持多种操作系统,但是唯独不支持FreeBSD,于是尝试在FreeBSD里编译安装。
先上结论,官网的ollama没有编译成功,使用特供版可以安装成功。因为特供版改了代码,为了安全,最后是在FreeBSD jail里操作的。
在FreeBSD下安装ollama(第一次尝试,失败)
编译环境配置
首先安装最新的go
pkg install go122-1.22.5 cmake
后来发现不行,还是安装了默认的go (原来需要使用go122这条命令来执行)
pkg install go
但是这个版本低啊
下载高版本试试。 下载:https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz
wget https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz
解压缩
tar -xzvf go1.22.5.freebsd-amd64.tar.gz
加上路径
export PATH=/home/skywalk/work/go/bin:$PATH
现在go就是1.22.5版本了
$ go version
go version go1.22.5 freebsd/amd64
加速go
# Set the GOPROXY environment variable
export GOPROXY=https://goproxy.io,direct
# Set environment variable allow bypassing the proxy for specified repos (optional)
export GOPRIVATE=git.mycompany.com,github.com/my/private
编译ollama
从官网下载ollama
git clone https://github.com/ollama/ollama
generate
go generate ./...
build
go build .
但是这里没有编译成功,最后报错
skywalk@fbhost:~/github/ollama $ go build .
package github.com/ollama/ollama
imports github.com/ollama/ollama/cmd
imports github.com/ollama/ollama/server
imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c
在FreeBSD jail里调试(第二次尝试,失败)
创建一个FreeBSDjail,登录
# cbsd jlogin fb12
登录后是csh,如果不适应,可以改成bash
安装需要的包
# pkg install -y git go122 cmake vulkan-headers vulkan-loader
下载特供版本
# git clone --depth 1 https://github.com/prep/ollama.git
# git clone https://github.com/prep/ollama.git
git clone https://github.com/prep/ollama
切branch
(这里没切换成)
# cd ollama && git checkout feature/add-bsd-support
先设定加速
csh下
# set
GO111MODULE=on
# set GOPROXY=https://goproxy.io,direct
# set GOPRIVATE=git.mycompany.com,github.com/my/private
bash下
# 启用 Go Modules 功能
export GO111MODULE=on
# Set the GOPROXY environment variable
export GOPROXY=https://goproxy.io,direct
# Set environment variable allow bypassing the proxy for specified repos (optional)
export GOPRIVATE=git.mycompany.com,github.com/my/private
开始go generate和build
# go122 generate ./...
# go122 build .
最后报错:
go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/[email protected]: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/[email protected]: invalid version: unknown revision 64ccaa8d9ca9
在FreeBSD jail里使用普通用户编译ollama特供版本(第三次尝试,成功)
若有报错,需要修改go.sum文件和go.mod文件。
使用如下命令:
bash
mkdir github.com
cd github.com
git clone https://github.com/prep/ollama.git
cd ollama && git checkout feature/add-bsd-support
# 启用 Go Modules 功能
export GO111MODULE=on
# Set the GOPROXY environment variable
export GOPROXY=https://goproxy.io,direct
# Set environment variable allow bypassing the proxy for specified repos (optional)
export GOPRIVATE=git.mycompany.com,github.com/my/private
go122 generate ./...
go122 build .
报错调试过程
还是有报错: go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/[email protected]: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/[email protected]: invalid version: unknown revision 64ccaa8d9ca9
修改go.sum文件,将里面的pdeviene/tensor 修改成
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=
还需要修改go.mod文件,将里面的pdevine/tensor版本改成5.10日的最新版本:
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c
然后重新generate和build
根据实际情况,如果不重新generate,看提示大约需要重新get一下:
go122 get github.com/ollama/ollama/convert
然后再继续build
go122 build .
搞定!
测试一下:
./ollama help | head -n 5
./ollama help | head -n 5
Large language model runner
Usage:
ollama [flags]
ollama [command]
证明确实编译成功了!
启动ollama
首先要启动ollama服务
./ollama serve
运行llama3模型
./ollama run llama3
ollama会自动下载模型。模型下载好后,会进入交互界面。
ollama的交互输出
一句回答用了50分钟.....但至少它成了,在FreeBSD下执行成功了!
[skywalk@fb12 ~/gihub.com/ollama]$ ./ollama run llama3
[GIN] 2024/07/15 - 12:01:47 | 200 | 466.704µs | 10.0.0.12 | HEAD "/"
[GIN] 2024/07/15 - 12:01:47 | 404 | 450.54µs | 10.0.0.12 | POST "/api/show"
pulling manifest ⠦ time=2024-07-15T12:01:50.016+08:00 level=INFO source=download.go:136 msg="downloading 6a0746a1ec1a in 47 100 MB part(s)"
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB time=2024-07-15T12:20:25.740+08:00 level=INFO source=download.go:136 msg="downloapulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB tpulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
pulling 8ab4849b038c... 100% ▕████████████████▏ 254 B
pulling 577073ffcc6c... 100% ▕████████████████▏ 110 B
pulling 3f8eb4da87fa... 100% ▕████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
[GIN] 2024/07/15 - 12:22:06 | 200 | 1.786897ms | 10.0.0.12 | POST "/api/show"
[GIN] 2024/07/15 - 12:22:06 | 200 | 1.384117ms | 10.0.0.12 | POST "/api/show"
time=2024-07-15T12:22:06.288+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠴ time=2024-07-15T12:22:20.820+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
time=2024-07-15T12:22:20.821+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 62268"
time=2024-07-15T12:22:20.847+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-07-15T12:22:20.847+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x10139f812000","timestamp":1721017340}
⠦ {"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x10139f812000","timestamp":1721017340}
{"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x10139f812000","timestamp":1721017340,"total_threads":4}
⠧ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
⠇ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
⠏ llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
⠙ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
⠹ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size = 0.15 MiB
llm_load_tensors: CPU buffer size = 4437.80 MiB
.......................................................................................
⠸ llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
⠦ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.50 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
⠧ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x10139f812000","timestamp":1721017395}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"62268","tid":"0x10139f812000","timestamp":1721017395}
{"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x10139f812000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":37211,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":60236,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":43135,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":31620,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56527,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":53213,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":21875,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":47567,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56264,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
⠇ [GIN] 2024/07/15 - 12:23:15 | 200 | 1m8s | 10.0.0.12 | POST "/api/chat"
>>> hello
time=2024-07-15T14:22:47.710+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠋ time=2024-07-15T14:23:02.785+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠙ time=2024-07-15T14:23:02.789+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 61604"
time=2024-07-15T14:23:02.811+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-07-15T14:23:02.812+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x20da49412000","timestamp":1721024582}
{"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x20da49412000","timestamp":1721024582}
{"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x20da49412000","timestamp":1721024582,"total_threads":4}
⠸ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
⠼ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
⠧ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
⠇ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size = 0.15 MiB
⠙ llm_load_tensors: CPU buffer size = 4437.80 MiB
.......................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
⠴ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.50 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
⠦ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x20da49412000","timestamp":1721024651}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"61604","tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":48229,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33319,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":54187,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":28162,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33773,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":19633,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":35779,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":18413,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠧ {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":9,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠇ {"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/tokenize","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":10,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠏ {"function":"launch_slot_with_data","level":"INFO","line":833,"msg":"slot is processing task","slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1817,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":10,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","level":"INFO","line":1841,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
Hello! It's nice to meet you. Is there something I can help you with, or
would you like to chat?{"function":"print_timings","level":"INFO","line":276,"msg":"prompt eval time = 106459.91 ms / 10 tokens (10645.99 ms per token, 0.09 tokens per second)","n_prompt_tokens_processed":10,"n_tokens_second":0.09393207617523164,"slot_id":0,"t_prompt_processing":106459.906,"t_token":10645.990600000001,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"print_timings","level":"INFO","line":290,"msg":"generation eval time = 2868918.63 ms / 26 runs (110343.02 ms per token, 0.01 tokens per second)","n_decoded":26,"n_tokens_second":0.00906264811318913,"slot_id":0,"t_token":110343.0241923077,"t_token_generation":2868918.629,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"print_timings","level":"INFO","line":299,"msg":" total time = 2975378.54 ms","slot_id":0,"t_prompt_processing":106459.906,"t_token_generation":2868918.629,"t_total":2975378.535,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"update_slots","level":"INFO","line":1649,"msg":"slot released","n_cache_tokens":36,"n_ctx":2048,"n_past":35,"n_system_tokens":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721027627}
[GIN] 2024/07/15 - 15:13:47 | 200 | 50m59s | 10.0.0.12 | POST "/api/chat"
总结
ollama可以在FreeBSD下编译,但是需要特供版本。官网是:GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. 特供版是:https://github.com/prep/ollama
特供版如果编译时报错,看报错信息,相应修改go.sum go.mod文件里 github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c 这句,修改成5.10日版本。
整个系统在CPU J1900 、8G 内存,软件FreeBSD fbhost 14.1-RELEASE FreeBSD 下调试成功。尽管ollama速度非常慢,大约50分钟回答一个问题,但至少,它确实成功了!
调试
go build的时候报错
skywalk@fbhost:~/github/ollama $ go build .
package github.com/ollama/ollama
imports github.com/ollama/ollama/cmd
imports github.com/ollama/ollama/server
imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c
怎么会有gpu呢? 哪里配置不对?
为了FreeBSD下编译查看了ollama的issue
Ollama on FreeBSD · Issue #1102 · ollama/ollama · GitHub
在这个issue里,提到了方法,使用另一个repo:
# pkg install -y git go122 cmake vulkan-headers vulkan-loader
# git clone https://github.com/prep/ollama.git
# cd ollama && git checkout feature/add-bsd-support
# go122 generate ./...
# go122 build .
# ./ollama help | head -n 5
Large language model runner
Usage:
ollama [flags]
ollama [command]
Works fine for me, no problems encountered.
本来好像主repo 也可以FreeBSD下安装的,但是5.6日之后就不行了:Make maximum pending request configurable by dhiltgen · Pull Request #4144 · ollama/ollama · GitHub
git checkout feature/add-bsd-support报错
git checkout feature/add-bsd-support
error: pathspec 'feature/add-bsd-support' did not match any file(s) known to git
原来是因为前面代码没有下载全的原因。
# git clone --depth 1 https://github.com/prep/ollama.git
切branch(这里没切换成)
# cd ollama && git checkout feature/add-bsd-support
这里不能用--depth 1 ,去掉,
git clone https://github.com/prep/ollama.git
这样就能git checkout feature/add-bsd-support 成功了。
vulkan-headers
和 vulkan-loader
两个包的功能
vulkan-headers
和 vulkan-loader
是与 Vulkan API 相关的两个关键组件,它们在开发使用 Vulkan 图形和计算 API 的应用程序时起着重要的作用。Vulkan 是一个跨平台的图形和计算 API,由 Khronos Group 开发,旨在提供高性能的 3D 图形渲染能力。
在jail里build的时候报错C source files not allowed
先上结论,是因为github抽风。
在jail里build的时候报错imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cpu.c gpu_info_cudart.c
同时还有github连不上的报错:
fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
package github.com/ollama/ollama
imports github.com/ollama/ollama/cmd
imports github.com/ollama/ollama/server
imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cpu.c gpu_info_cudart.c
convert/gemma.go:12:2: github.com/pdevine/[email protected]: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /root/go/pkg/mod/cache/vcs/6bf5b14e60582bdf39d55e6388653dd8c2addad6937480b86ddb5a729a838afe: exit status 128:
fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
convert/gemma.go:13:2: github.com/pdevine/[email protected]: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /root/go/pkg/mod/cache/vcs/6bf5b14e60582bdf39d55e6388653dd8c2addad6937480b86ddb5a729a838afe: exit status 128:
fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
第一次generate之后,build没成功
+ echo 'go generate completed. LLM runners: cpu cpu_avx cpu_avx2 vulkan'
go generate completed. LLM runners: cpu cpu_avx cpu_avx2 vulkan
[root@fb12 ollama]# go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/[email protected]: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/[email protected]: invalid version: unknown revision 64ccaa8d9ca9
不知道什么原因,不过有可能还是github抽风....
再重新generate一下。继续抽风中
前面都是用的root账户,尝试使用普通用户编译试试。
普通用户也是这个报错
修改go.sum文件,将里面的pdeviene/tensor 修改成
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=
修改之后,go build报错
go build 报错convert/gemma.go:13:2: missing go.sum entry for module providing package
go122 build .
convert/gemma.go:12:2: missing go.sum entry for module providing package github.com/pdevine/tensor (imported by github.com/ollama/ollama/convert); to add:
go get github.com/ollama/ollama/convert
convert/gemma.go:13:2: missing go.sum entry for module providing package github.com/pdevine/tensor/native (imported by github.com/ollama/ollama/convert); to add:
go get github.com/ollama/ollama/convert
发现go.mod 文件里也有版本,修改成当前的:
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c
但是又报错了
go.sum go.mod文件里修改tensor版本后报错
verifying github.com/google/[email protected]: checksum mismatch
downloaded: h1:N8EguYFm2wwdpoNcpchQY0tPs85vOJkboFb2dPxmixo=
go.sum: h1:/PtAHvnBY4Kqnx/xCQ3OIV9uYcSFGScBsWI3Oogeh6w=
SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.
go122 generate ./...
go: downloading github.com/google/flatbuffers v1.12.0
go: downloading gonum.org/v1/gonum v0.8.2
verifying github.com/google/[email protected]: checksum mismatch
downloaded: h1:N8EguYFm2wwdpoNcpchQY0tPs85vOJkboFb2dPxmixo=
go.sum: h1:/PtAHvnBY4Kqnx/xCQ3OIV9uYcSFGScBsWI3Oogeh6w=
SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.
晕了,这个特供版本有问题啊
go.mod 修改成这样试试 github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c
然后执行
go122 get github.com/ollama/ollama/convert
然后执行
go122 build .
终于安装完成了。
标签:尝试,load,llama,FreeBSD,meta,llm,print,ollama From: https://blog.csdn.net/skywalk8163/article/details/140408428