第一个完整的使用Hugging Face Transformers对预训练模型进行微调的demo is coming!
整体步骤为:
加载数据集load dataset
from datasets import load_dataset
raw_datasets = load_dataset("glue", "mrpc")
对数据集做分词tokenize
from transformers import AutoModelForSequenceClassification
checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(examples):
return tokenizer(examples["sentence1"], examples["sentence2"],truncation=True)
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
组装DataCollator
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer, padding=True)
构建模型model
标签:load,NLP,datasets,demo,Hugging,dataset,examples,tokenize,import
From: https://blog.csdn.net/weixin_43636694/article/details/144783351