首页 > 其他分享 >使用阿里云微调chatglm2

使用阿里云微调chatglm2

时间:2023-08-09 17:13:12浏览次数:49  
标签:%% cfg chatglm2 微调 阿里 import path model self

完整的代码可以参考:https://files.cnblogs.com/files/lijiale/chatglm2-6b.zip?t=1691571940&download=true

# %% [markdown]
# # 微调前

# %%
model_path = "/mnt/workspace/ChatGLM2-6B/chatglm2-6b"

from transformers import AutoTokenizer, AutoModel
# 载入Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

from IPython.display import display, Markdown, clear_output

def display_answer(model, query, history=[]):
    for response, history in model.stream_chat(
            tokenizer, query, history=history):
        clear_output(wait=True)
        display(Markdown(response))
    return history

model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()
model = model.eval()

display_answer(model, "类型#上衣\*材质#牛仔布\*颜色#白色\*风格#简约\*图案#刺绣\*衣样式#外套\*衣款式#破洞")


# %% [markdown]
# # 微调后的效果
# 

# %%
import os
import torch
from transformers import AutoConfig
from transformers import AutoTokenizer, AutoModel

model_path = "/mnt/workspace/ChatGLM2-6B/chatglm2-6b"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained(model_path, config=config, trust_remote_code=True)
prefix_state_dict = torch.load(os.path.join("/mnt/workspace/ChatGLM2-6B/ptuning/output/adgen-chatglm2-6b-pt-128-2e-2/checkpoint-3000", "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
    if k.startswith("transformer.prefix_encoder."):
        new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()

response, history = model.chat(tokenizer, "类型#上衣\*材质#牛仔布\*颜色#白色\*风格#简约\*图案#刺绣\*衣样式#外套\*衣款式#破洞", history=[])
print(response)


# %%
!pip install torchkeras

# %%
#导入模块
import numpy as np
import pandas as pd 
import torch
from torch import nn 
from torch.utils.data import Dataset,DataLoader 

from argparse import Namespace
cfg = Namespace()
 

from argparse import Namespace
cfg = Namespace()
 

#dataset
cfg.prompt_column = 'prompt'
cfg.response_column = 'response'
cfg.history_column = None
cfg.source_prefix = '' #添加到每个prompt开头的前缀引导语

cfg.max_source_length = 128 
cfg.max_target_length = 128

#model
cfg.model_name_or_path = '/mnt/workspace/ChatGLM2-6B/chatglm2-6b'  #远程'THUDM/chatglm-6b' 
cfg.quantization_bit = None #仅仅预测时可以选 4 or 8 


 
#train
cfg.epochs = 100 
cfg.lr = 5e-3
cfg.batch_size = 1
cfg.gradient_accumulation_steps = 16 #梯度累积




import transformers
from transformers import  AutoModel,AutoTokenizer,AutoConfig,DataCollatorForSeq2Seq


config = AutoConfig.from_pretrained(cfg.model_name_or_path, trust_remote_code=True)
 

tokenizer = AutoTokenizer.from_pretrained(
    cfg.model_name_or_path, trust_remote_code=True)

model = AutoModel.from_pretrained(cfg.model_name_or_path,config=config,
                                  trust_remote_code=True).half() 

#先量化瘦身
if cfg.quantization_bit is not None:
    print(f"Quantized to {cfg.quantization_bit} bit")
    model = model.quantize(cfg.quantization_bit)
    
#再移动到GPU上
model = model.cuda();

 
# 通过注册jupyter魔法命令可以很方便地在jupyter中测试ChatGLM 
from torchkeras.chat import ChatGLM 
chatglm = ChatGLM(model,tokenizer)





# %%
%%chatglm
类型#上衣\*材质#牛仔布\*颜色#白色\*风格#简约\*图案#刺绣\*衣样式#外套\*衣款式#破洞


# %%
#定义一条知识样本~
import json
keyword = '梦中情炉'
 

description = '''梦中情炉一般指的是炼丹工具torchkeras。
这是一个通用的pytorch模型训练模版工具。
torchkeras是一个三好炼丹炉:好看,好用,好改。
她有torch的灵动,也有keras的优雅,并且她的美丽,无与伦比。
所以她的作者一个有毅力的吃货给她取了一个别名叫做梦中情炉。'''

#对prompt使用一些简单的数据增强的方法,以便更好地收敛。
def get_prompt_list(keyword):
    return [f'{keyword}', 
            f'你知道{keyword}吗?',
            f'{keyword}是什么?',
            f'介绍一下{keyword}',
            f'你听过{keyword}吗?',
            f'啥是{keyword}?',
 
            f'{keyword}是何物?',
            f'何为{keyword}?',
           ]

# data =[{'prompt':x,'response':description} for x in get_prompt_list(keyword) ]
data = []
with open("/mnt/workspace/ChatGLM2-6B/ptuning/AdvertiseGen_Simple/train.json", "r", encoding="utf-8") as f:
    lines = f.readlines()
    for line in lines:
        d = json.loads(line)
        data.append({'prompt':d['content'],'response':d['summary']})
    

dfdata = pd.DataFrame(data)
display(dfdata) 

# %%
import datasets 
#训练集和验证集一样
ds_train_raw = ds_val_raw = datasets.Dataset.from_pandas(dfdata)

# %%
def preprocess(examples):
    max_seq_length = cfg.max_source_length + cfg.max_target_length
    model_inputs = {
        "input_ids": [],
        "labels": [],
    }
    for i in range(len(examples[cfg.prompt_column])):
        if examples[cfg.prompt_column][i] and examples[cfg.response_column][i]:
            query, answer = examples[cfg.prompt_column][i], examples[cfg.response_column][i]

            history = examples[cfg.history_column][i] if cfg.history_column is not None else None
            prompt = tokenizer.build_prompt(query, history)
 

            prompt = cfg.source_prefix + prompt
            a_ids = tokenizer.encode(text=prompt, add_special_tokens=True, truncation=True,
                                     max_length=cfg.max_source_length)
            b_ids = tokenizer.encode(text=answer, add_special_tokens=False, truncation=True,
                                     max_length=cfg.max_target_length)

            context_length = len(a_ids)
            input_ids = a_ids + b_ids + [tokenizer.eos_token_id]
            labels = [tokenizer.pad_token_id] * context_length + b_ids + [tokenizer.eos_token_id]

            pad_len = max_seq_length - len(input_ids)
            input_ids = input_ids + [tokenizer.pad_token_id] * pad_len
            labels = labels + [tokenizer.pad_token_id] * pad_len
            labels = [(l if l != tokenizer.pad_token_id else -100) for l in labels]
 
            model_inputs["input_ids"].append(input_ids)
            model_inputs["labels"].append(labels)
    return model_inputs

ds_train = ds_train_raw.map(
    preprocess,
    batched=True,
    num_proc=4,
    remove_columns=ds_train_raw.column_names
)

ds_val = ds_val_raw.map(
    preprocess,
    batched=True,
    num_proc=4,
 
    remove_columns=ds_val_raw.column_names
)


data_collator = DataCollatorForSeq2Seq(
    tokenizer,
    model=None,
    label_pad_token_id=-100,
    pad_to_multiple_of=None,
    padding=False
)

dl_train = DataLoader(ds_train,batch_size = cfg.batch_size,
                      num_workers = 2, shuffle = True, collate_fn = data_collator 
                     )
dl_val = DataLoader(ds_val,batch_size = cfg.batch_size,
                      num_workers = 2, shuffle = False, collate_fn = data_collator 
 
                     )

for batch in dl_train:
    break
print(len(dl_train))



# %%
!pip install peft

# %%
from peft import get_peft_model, AdaLoraConfig, TaskType

#训练时节约GPU占用
model.config.use_cache=False
 
model.supports_gradient_checkpointing = True  #
model.gradient_checkpointing_enable()
model.enable_input_require_grads()

peft_config = AdaLoraConfig(
    task_type=TaskType.CAUSAL_LM, inference_mode=False,
    r=8,
    lora_alpha=32, lora_dropout=0.1,
    target_modules=["query", "value"]
)

peft_model = get_peft_model(model, peft_config)

peft_model.is_parallelizable = True
peft_model.model_parallel = True
 
peft_model.print_trainable_parameters()

# %%
from torchkeras import KerasModel 
from accelerate import Accelerator 

class StepRunner:
    def __init__(self, net, loss_fn, accelerator=None, stage = "train", metrics_dict = None, 
                 optimizer = None, lr_scheduler = None
                 ):
 
        self.net,self.loss_fn,self.metrics_dict,self.stage = net,loss_fn,metrics_dict,stage
        self.optimizer,self.lr_scheduler = optimizer,lr_scheduler
        self.accelerator = accelerator if accelerator is not None else Accelerator() 
        if self.stage=='train':
            self.net.train() 
        else:
            self.net.eval()
    
    def __call__(self, batch):
        
        #loss
        with self.accelerator.autocast():
            loss = self.net(input_ids=batch["input_ids"],labels=batch["labels"]).loss

        #backward()
 
        if self.optimizer is not None and self.stage=="train":
            self.accelerator.backward(loss)
            if self.accelerator.sync_gradients:
                self.accelerator.clip_grad_norm_(self.net.parameters(), 1.0)
            self.optimizer.step()
            if self.lr_scheduler is not None:
                self.lr_scheduler.step()
            self.optimizer.zero_grad()
            
        all_loss = self.accelerator.gather(loss).sum()
        
        #losses (or plain metrics that can be averaged)
        step_losses = {self.stage+"_loss":all_loss.item()}
        
        #metrics (stateful metrics)
 
        step_metrics = {}
        
        if self.stage=="train":
            if self.optimizer is not None:
                step_metrics['lr'] = self.optimizer.state_dict()['param_groups'][0]['lr']
            else:
                step_metrics['lr'] = 0.0
        return step_losses,step_metrics
    
KerasModel.StepRunner = StepRunner 


#仅仅保存lora相关的可训练参数
def save_ckpt(self, ckpt_path='checkpoint', accelerator = None):
    unwrap_net = accelerator.unwrap_model(self.net)
 
    unwrap_net.save_pretrained(ckpt_path)
    
def load_ckpt(self, ckpt_path='checkpoint'):
    self.net = self.net.from_pretrained(self.net.base_model.model,ckpt_path)
    self.from_scratch = False
    
KerasModel.save_ckpt = save_ckpt 
KerasModel.load_ckpt = load_ckpt 


# %%
optimizer = torch.optim.AdamW(peft_model.parameters(),lr=cfg.lr) 
keras_model = KerasModel(peft_model,loss_fn = None,
        optimizer=optimizer) 
ckpt_path = 'single_chatglm3'

# %%
keras_model.fit(train_data = dl_train,
                val_data = dl_val,
                epochs=100,
                patience=20,
                monitor='val_loss',
                mode='min',
                ckpt_path = ckpt_path,
                mixed_precision='fp16',
                gradient_accumulation_steps = cfg.gradient_accumulation_steps
               )


# %%
#验证模型
from peft import PeftModel 
ckpt_path = 'single_chatglm3'
model_old = AutoModel.from_pretrained(cfg.model_name_or_path,
                                  load_in_8bit=False, 
                                  trust_remote_code=True)
peft_loaded = PeftModel.from_pretrained(model_old,ckpt_path).cuda()
model_new = peft_loaded.merge_and_unload() #合并lora权重

chatglm = ChatGLM(model_new,tokenizer,max_chat_rounds=20) #支持多轮对话,可以从之前对话上下文提取知识。


# %%
chatglm = ChatGLM(model_new,tokenizer,max_chat_rounds=0) #支持多轮对话,可以从之前对话上下文提取知识。

# %%
%%chatglm
类型#上衣\*材质#牛仔布\*颜色#白色\*风格#简约\*图案#刺绣\*衣样式#外套\*衣款式#破洞

# %%
save_path = "chatglm2-6b-adgen"
model_new.save_pretrained(save_path, max_shard_size='2GB')
tokenizer.save_pretrained(save_path)

# %%
!cp ChatGLM2-6B/chatglm2-6b/*.py chatglm2-6b-adgen/

# %%
from transformers import  AutoModel,AutoTokenizer
    model_name = "chatglm2-6b-adgen" 
tokenizer = AutoTokenizer.from_pretrained(
    model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name,
        trust_remote_code=True).half().cuda()
response,history = model.chat(tokenizer,query = '你听说过梦中情炉吗?',history = [])
print(response)

# %%
response,history = model.chat(tokenizer,query = '类型#上衣\*材质#牛仔布\*颜色#白色\*风格#简约\*图案#刺绣\*衣样式#外套\*衣款式#破洞',history = [])
print(response)

# %%

标签:%%,cfg,chatglm2,微调,阿里,import,path,model,self
From: https://www.cnblogs.com/lijiale/p/17617364.html

相关文章

  • 在矩池云使用ChatGLM-6B & ChatGLM2-6B
    ChatGLM-6B和ChatGLM2-6B都是基于GeneralLanguageModel(GLM)架构的对话语言模型,是清华大学KEG实验室和智谱AI公司于2023年共同发布的语言模型。模型有62亿参数,一经发布便受到了开源社区的欢迎,在中文语义理解和对话生成上有着不凡的表现。ChatGLM-6B可以在消费级......
  • Windows系统 如何配置Maven的本地仓库 【详解Maven settings.xml配置(指定本地仓库、
    1.确认安装Maven首先,我们需要确认已在计算机上安装了Maven。你可以从Maven官方网站:https://archive.apache.org/dist/maven/maven-3/下载适合你代码ide版本的Maven安装包,比如我idea2019就下载maven3.5-3.6之间的版本。1、先新建名为MAVEN_HOME的变量,值为你的的maven解压路径......
  • Springboot集成使用阿里云kafka详细步骤
    明确连接认证类型首先要明确使用哪种连接认证类型Ons模式参考https://github.com/AliwareMQ/aliware-kafka-demos/tree/master/kafka-java-demo/betaOns模式的conf内容KafkaClient{com.aliyun.openservices.ons.sasl.client.OnsLoginModulerequiredAccessKey="......
  • 使用阿里云函数服务部署 nestjs
    一路踩坑对于一个现有的nestjs项目,如何在阿里云上进行函数部署ServerlessDevs按照官方推荐,使用ServerlessDevs,具体而言,先全局安装npminstall@serverless-devs/s-g,然后在项目中添加s.yaml配置文件。配置文件详细说明:Yaml规范-ServerlessDevsfc/docs/zh/yamla......
  • Telsa T4配置下用peft微调t5模型
    记录运行这个代码的过程:https://huggingface.co/docs/peft/task_guides/seq2seq-prefix-tuning环境配置虚拟环境python-Vcondacreate-npeft-practicepython=3.10.12condaactivatepeft-practice安装pytorchcondainstallpytorchtorchvisiontorchaudiopytorch-cu......
  • 阿里云: web如何直传oss & 常见问题
    阿里云:web如何直传oss&常见问题 如何使用input.Type=‘file‘拿到文件对象1、在页面中添加<inputtype="file"style="display:none"ref="input"@input="upload">在需要上传文件的地方增加<button@click="$refs.input.click()">......
  • 阿里云 EMAS & 魔笔:7月产品动态
    内容摘要EMASSuite小程序支持DeepLink(混合云)云构建配置环境变量及文档更新移动测试国际站功能全面支持ios17移动推送iOS发布SDK2.0.2版本,修改偶发崩溃BUG魔笔新增企业定制模版;优化数据选择框组件和面包屑组件产品动态移动研发平台EMAS类目产品......
  • 【必看!】阿里云推出QWen-7B和QWen-7b-Chat,开放免费商用!
    阿里云于8月3日宣布开源两款重要的大型模型——QWen-7B和QWen-7b-Chat。这两款模型的参数规模达到了令人瞩目的70亿,并且已经在HuggingFace和ModelScope平台上开放,并可免费商用。以下是相关链接:GitHub项目主页:https://github.com/QwenLM/Qwen-7BHuggingFace:https://huggingface......
  • Maven的sitting文件配置阿里镜像。
    添加国内镜像源(阿里云):中央仓库因为是国外的,所以下载很慢,因此添加<mirror>标签下,添加国内镜,这样下载jar包速度很快。<mirror><id>alimaven</id><mirrorOf>central</mirrorOf><name>aliyunmaven</name><url>http://maven.aliyun.co......
  • 阿里云部署 ChatGLM2-6B 与 langchain+chatGLM
    1.ChatGLM2-6B部署更新系统apt-getupdate安装gitapt-getinstallgit-lfsgitinitgitlfsinstall克隆ChatGLM2-6B源码gitclonehttps://github.com/THUDM/ChatGLM2-6B.git克隆chatglm2-6b模型#进入目录cdChatGLM2-6B#创建目录mkdirmodel#进入......