看模型加载的参数设置.
import torch
# 初始化Half Tensor
h = torch.tensor([1.0,2.0,3.0], dtype=torch.half)
# h = torch.tensor([1.0,2.0,3.0], dtype=torch.float16) # 跟上面一行一样.
# 查看数据类型
print(h.dtype)
import accelerate
import bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM,TextIteratorStreamer
from transformers import AlbertTokenizer, AlbertModel
model = AlbertModel.from_pretrained('./albert',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True,low_cpu_mem_usage=True)
# torch_dtype 模型本身的类型, 不写的话就自己根据权重文件查询出来.这个是权重文件本身决定的,一般在config.json里面
# load_in_8bit 会把模型转化为8bit类型.这个可以自己设置.
print(1)
-
low_cpu_mem_usage
algorithm:This is an experimental function that loads the model using ~1x model size CPU memory Here is how it works: 1. save which state_dict keys we have 2. drop state_dict before the model is created, since the latter takes 1x model size CPU memory 3. after the model has been instantiated switch to the meta device all params/buffers that are going to be replaced from the loaded state_dict 4. load state_dict 2nd time 5. replace the params/buffers from the state_dict Currently, it can't handle deepspeed ZeRO stage 3 and ignores loading errors
这个算法low_cpu_mem 如果设置True
那么他会进行.
把权重字典的keys保存下来.
然后把state_dict删除.
初始化模型.把需要加载的参数位置放到meta device里面.
再加载state_dict
可以节省cpu内存. 小内存时候需要打开.
标签:部署,dtype,模型,torch,state,dict,llama,import,model From: https://www.cnblogs.com/zhangbo2008/p/17921689.html