＜QNAP 453D QTS-5.x＞日志记录：在 NAS 从 huggingface_hub 下载模型 google-t5/t5-base，在 NAS 安装 python312 与 Qgit

标签：google t5 huggingface 2024 NAS base git model

目的：

离线使用 google-t5/t5-base 预训练模型，行多种自然语言处理任务：翻译可借不支持东亚语言。Project-22.Ai -1.T5-base 只能在： English, French, Romanian, German 间使用, code 非常简单，大概沾到本地/离线使用模型的皮毛。运行这么小的模型，也使我的笔记拔高了，硬件要升级，空空空的钱包。

1. 下载 google-t5/t5-base 模型

a. 下载链接:

git clone [email protected]:google-t5/t5-base

注：如果有升级或重装过 git，要安装 git-lfs

# 首次安装 Git LFS 后执行一次
git lfs install    # 只需执行一次

以后不再提示 git-lfs，默认与我的环境同步。

b. 使用命令下载模型：

CMD: git clone [email protected]:google-t5/t5-base

[/share/Download/AI] # git lfs install
Updated Git hooks.
Git LFS initialized.
[/share/Download/AI] # git clone [email protected]:google-t5/t5-base
Cloning into 't5-base'...
remote: Enumerating objects: 78, done.
remote: Counting objects: 100% (13/13), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 78 (delta 8), reused 3 (delta 3), pack-reused 65 (from 1)
Receiving objects: 100% (78/78), 972.74 KiB | 1.42 MiB/s, done.
Resolving deltas: 100% (34/34), done.
Downloading flax_model.msgpack (892 MB)
Error downloading object: flax_model.msgpack (d96ab4b): Smudge error: Error downloading flax_model.msgpack (d96ab4b2e2ac1743c32e80669ec37905151c78d8136ff0ce4ba6566bde6e932f): batch response: Authentication required: Password authentication in git is no longer supported. You must use a user access token or an SSH key instead. See https://huggingface.co/blog/password-git-deprecation

Errors logged to '/share/CACHEDEV1_DATA/Download/Software/ALL-AI/t5-base/.git/lfs/logs/20241112T182252.368534759.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: flax_model.msgpack: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

“Authentication required: Password authentication in git is no longer supported. You must use a user access token or an SSH key instead.”

看来要用 access token

2. 使用 HuggingFace PAT 登录

获取 Hugging Face 个人访问令牌（PAT）

登录到你的 Hugging Face 账号.
在右上角点击你的头像，选择“Settings”（设置）。
在左侧菜单中选择“Access Tokens”（访问令牌）。

先把它存到一个文件里，后面才会用到。

3. 在 QNAP NAS 安装 Python 3.12.2

a. 检查 python 版本

不记得这个python 2.7.18 是什么时候安装的，八成是自带的。

[~] # python -V
Python 2.7.18
[~] # python -V
-sh: python3: command not found

b. 安装 python 3.12 为了安装 huggingface hub 库/包

安装后，发现并不是想要的版本。因为路径原因：

[/share/CACHEDEV1_DATA/.qpkg/container-station/bin] # which python
/usr/local/bin/python

点开 QPython312 app, 里面列出路径：/opt/QPython312/bin, 如图：/opt/QPython312bin/python3

c. 修改全局变量 $PATH for python3

QNAP QTS-5.2 profile 文件位置：

[~] # vi /opt/etc/profile

在最后一行添加:

export PATH=/opt/QPython312/bin:/opt/sbin:/your/custom/path:$PATH

重新登录，检查版本：

[~] # python3 -V
Python 3.12.2

4. 安装 `huggingface_hub` 包

a. 升级一下 pip3

[~] # pip3 install --upgrade pip
Requirement already satisfied: pip in /opt/QPython312/lib/python3.12/site-packages (24.3.1)

b. 安装 HF hub

[~] # pip3 install huggingface_hub
Collecting huggingface_hub
  Downloading huggingface_hub-0.26.2-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: filelock in /opt/QPython312/lib/python3.12/site-packages (from huggingface_hub) (3.13.1)
...
Downloading fsspec-2024.10.0-py3-none-any.whl (179 kB)
Downloading PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (767 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 767.5/767.5 kB 1.7 MB/s eta 0:00:00
Downloading tqdm-4.67.0-py3-none-any.whl (78 kB)
...
Successfully installed certifi-2024.8.30 charset-normalizer-3.4.0 fsspec-2024.10.0 huggingface_hub-0.26.2 idna-3.10 pyyaml-6.0.2 requests-2.32.3 tqdm-4.67.0 typing-extensions-4.12.2 urllib3-2.2.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

c. 使用 PAT （个人令牌）登录

用到 2 复制的那字符串，在"Enter your token" 鼠标右键粘贴

[~] # huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
The token `HFT` has been saved to /root/.cache/huggingface/stored_tokens
Cannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.

git config --global credential.helper store

Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.
Token has not been saved to git credential helper.
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `HFT`

final: 再次使用 git 下载t5模型~成功

[/share/Download/AI] # ssh -T hf
Hi DaveNian, welcome to Hugging Face.
[/share/Download/AI] # git clone hf:google-t5/t5-base
fatal: destination path 't5-base' already exists and is not an empty directory.
[/share/Download/AI] # rm -fR t5-base/
[/share/Download/AI] # git clone hf:google-t5/t5-base
Cloning into 't5-base'...
remote: Enumerating objects: 78, done.
remote: Counting objects: 100% (13/13), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 78 (delta 8), reused 3 (delta 3), pack-reused 65 (from 1)
Receiving objects: 100% (78/78), 972.59 KiB | 1.46 MiB/s, done.
Resolving deltas: 100% (34/34), done.
Filtering content: 100% (5/5), 4.15 GiB | 6.27 MiB/s, done.
[/share/Download/AI] #

下载后查看文件：

实践：

成来是想写怎么使用这个模型，但不支持中文啊，就写点儿留给自己的提示。

Google: T5-base 模型

介绍一下模型里的各个文件：

C:\2024-MyProgramFiles\22.ai\t5-base>dir
 Volume in drive C has no label.
 Volume Serial Number is 5CA1-5BDC

 Directory of C:\2024-MyProgramFiles\22.ai\t5-base

11/12/2024  08:16 PM    <DIR>          .
11/13/2024  01:00 AM    <DIR>          ..
11/12/2024  07:26 PM             1,208 config.json
11/12/2024  07:37 PM       891,625,348 flax_model.msgpack
11/12/2024  07:26 PM               147 generation_config.json
11/12/2024  07:37 PM       891,646,390 model.safetensors
11/12/2024  07:37 PM       891,691,430 pytorch_model.bin
11/12/2024  07:26 PM             8,477 README.md
11/12/2024  07:37 PM       891,679,884 rust_model.ot
11/12/2024  07:26 PM           791,656 spiece.model
11/12/2024  07:37 PM       892,146,080 tf_model.h5
11/12/2024  07:26 PM         1,389,353 tokenizer.json
              10 File(s)  4,460,979,973 bytes
               2 Dir(s)  85,797,109,760 bytes free

权重文件：

flax_model.msgpack model.safetensors pytorch_model.bin rust_model.ot
tf_model.h5 它们只是不同格式：

序号	文件名	格式	使用	唠叨两句
1	flax_model.msgpack	用于 JAX/Flax 框架	适合TPU运算	使用MessagePack格式序列化
2	model.safetensors	Hugging Face 格式	可用于 PyTorch	更安全、加载更快
3	pytorch_model.bin	PyTorch原生格式	常用的格式	使用pickle序列化
4	rust_model.ot	ONNX格式，用于Rust语言	适合生产环境部署	跨平台推理优化
5	tf_model.h5	TensorFlow格式	HDF5文件格式	Google的TensorFlow框架使用

以上总结可以看出来，下载其中的一个就可以用了。

其它的主要文件：

tokenizer.json        分词器配置文件                                        包含词表和分词规则
spiece.model        SentencePiece分词器模型文件                用于文本预处理
generation_config.json        文本生成相关的配置参数可以调整 beams 等

transformers 加载机制：（以 T5ForConditionalGeneration 加载模型权重文件 pytorch_model.bin ）：

T5模型的权重文件加载是通过Hugging Face的transformers库的约定和自动化机制实现的，以下文件是必须存在：

t5-base/
├── config.json              # 模型配置文件
├── pytorch_model.bin        # 模型权重文件
├── tokenizer_config.json    # 分词器配置
├── tokenizer.json          # 分词器详细配置
└── spiece.model           # 分词器词表文件(如果使用sentencepiece)

只要在代码中指定路径，模型就能被加载：

from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained(
    model_path,
    local_files_only=True
)

transformers库会:

首先读取 config.json 文件，获取模型架构和参数
根据配置查找并加载 pytorch_model.bin 文件中的权重

练习：

英法翻译工具

结束语：

QNAP 上还是有很多第三方好 apps ，下图是我的 NAS 添加的第三方应用库

以前有个 qnapclub，已经不再可用。

标签：google,t5,huggingface,2024,NAS,base,git,model
From： https://blog.csdn.net/davenian/article/details/143728876

＜QNAP 453D QTS-5.x＞日志记录：在 NAS 从 huggingface_hub 下载模型 google-t5/t5-base，在 NAS 安装 python312 与 Qgit

目的：

1. 下载 google-t5/t5-base 模型

a. 下载链接:

b. 使用命令下载模型：

2. 使用 HuggingFace PAT 登录

获取 Hugging Face 个人访问令牌（PAT）

3. 在 QNAP NAS 安装 Python 3.12.2

a. 检查 python 版本

b. 安装 python 3.12 为了安装 huggingface hub 库/包

c. 修改全局变量 $PATH for python3

4. 安装 `huggingface_hub` 包

a. 升级一下 pip3

b. 安装 HF hub

c. 使用 PAT （个人令牌）登录

final: 再次使用 git 下载t5模型~成功

实践：

Google: T5-base 模型

权重文件：

其它的主要文件：

transformers 加载机制：（以 T5ForConditionalGeneration 加载模型权重文件 pytorch_model.bin ）：

只要在代码中指定路径，模型就能被加载：

练习：

结束语：

相关文章

赞助商

阅读排行

＜QNAP 453D QTS-5.x＞ 日志记录：在 NAS 从 huggingface_hub 下载模型 google-t5/t5-base，在 NAS 安装 python312 与 Qgit

目的：

1. 下载 google-t5/t5-base 模型

a. 下载链接:

b. 使用命令下载模型：

2. 使用 HuggingFace PAT 登录

获取 Hugging Face 个人访问令牌（PAT）

3. 在 QNAP NAS 安装 Python 3.12.2

a. 检查 python 版本

b. 安装 python 3.12 为了安装 huggingface hub 库/包

c. 修改全局变量 $PATH for python3

4. 安装 huggingface_hub 包

a. 升级一下 pip3

b. 安装 HF hub

c. 使用 PAT （个人令牌）登录

final: 再次使用 git 下载t5模型~成功

实践：

Google: T5-base 模型

权重文件：

其它的主要文件：

transformers 加载机制：（以 T5ForConditionalGeneration 加载模型权重文件 pytorch_model.bin ）：

只要在代码中指定路径，模型就能被加载：

练习：

结束语：

相关文章

赞助商

阅读排行

＜QNAP 453D QTS-5.x＞日志记录：在 NAS 从 huggingface_hub 下载模型 google-t5/t5-base，在 NAS 安装 python312 与 Qgit

4. 安装 `huggingface_hub` 包