HF Transformers Pipelines
Pipelines接口方式
任务名称 | 参数名称 | 参数描述 |
---|---|---|
sentiment-analysis | model | 指定使用的模型名称或路径。 |
tokenizer | 指定使用的分词器名称或路径。 | |
framework | 选择使用的深度学习框架,"pt" 表示 PyTorch,"tf" 表示 TensorFlow。 |
|
device | 设置使用的设备,-1 表示使用 CPU,0 表示使用第一个 GPU。 |
|
text-generation | model | 指定使用的模型名称或路径。 |
tokenizer | 指定使用的分词器名称或路径。 | |
max_length | 生成文本的最大长度。 | |
do_sample | 是否随机采样生成文本。 | |
top_p | 控制生成文本的多样性。 | |
translation_en_to_fr | model | 指定使用的模型名称或路径。 |
tokenizer | 指定使用的分词器名称或路径。 | |
src_lang | 输入文本的语言。 | |
tgt_lang | 期望输出的语言。 | |
question-answering | model | 指定使用的模型名称或路径。 |
tokenizer | 指定使用的分词器名称或路径。 | |
context | 提供上下文的文本。 | |
question | 需要回答的问题。 | |
text-classification | model | 指定使用的模型名称或路径。 |
tokenizer | 指定使用的分词器名称或路径。 | |
return_all_scores | 是否返回所有类别的得分,默认为 False 。 |
|
topk | 返回得分最高的前 k 个类别,默认为 1 。 |
|
Pipelines 已支持的完整任务列表:https://huggingface.co/docs/transformers/task_summary
transformers 自定义模型下载的路径
在transformers自定义模型下载的路径方法,调用pipeline会缓存模型.下面配置缓存路径
import os
os.environ['HF_HOME'] = '/mnt/new_volume/hf'
os.environ['HF_HUB_CACHE'] = '/mnt/new_volume/hf/hub'
默认模型对中文理解不好,以下输出可能存在问题
from transformers import pipeline
# 仅指定任务时,使用默认模型(不推荐)
pipe = pipeline("sentiment-analysis")
pipe("哈尔滨好冷")
输出:
[{'label': 'NEGATIVE', 'score': 0.8832131028175354}]
接口示例
pipe("这道菜味道不错")
输出:
[{'label': 'NEGATIVE', 'score': 0.8870086669921875}]
# 替换为英文后,文本分类任务的表现立刻改善
pipe("You learn things really quickly. You understand the theory class as soon as it is taught.")
输出:
[{'label': 'POSITIVE', 'score': 0.9961802959442139}]
pipe("Today haerbin is really cold.")
输出:
[{'label': 'NEGATIVE', 'score': 0.999659538269043}]
批处理调用模型推理
text_list = [
"哈尔滨冰雪大世界很好玩",
"I like Harbin",
"You are very good at playing ball."
]
pipe(text_list)
输出:
[{'label': 'NEGATIVE', 'score': 0.84312504529953},
{'label': 'POSITIVE', 'score': 0.6818807125091553},
{'label': 'POSITIVE', 'score': 0.999847412109375}]
from transformers import pipeline
classifier = pipeline(task="ner")
输出:
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
preds = classifier("Hugging Face is a French company based in New York City.")
preds = [
{
"entity": pred["entity"],
"score": round(pred["score"], 4),
"index": pred["index"],
"word": pred["word"],
"start": pred["start"],
"end": pred["end"],
}
for pred in preds
]
print(*preds, sep="\n")
输出:
{'entity': 'I-ORG', 'score': 0.9968, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}
{'entity': 'I-ORG', 'score': 0.9293, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}
{'entity': 'I-ORG', 'score': 0.9763, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}
{'entity': 'I-MISC', 'score': 0.9983, 'index': 6, 'word': 'French', 'start': 18, 'end': 24}
{'entity': 'I-LOC', 'score': 0.999, 'index': 10, 'word': 'New', 'start': 42, 'end': 45}
{'entity': 'I-LOC', 'score': 0.9987, 'index': 11, 'word': 'York', 'start': 46, 'end': 50}
{'entity': 'I-LOC', 'score': 0.9992, 'index': 12, 'word': 'City', 'start': 51, 'end': 55}
合并实体
classifier = pipeline(task="ner", grouped_entities=True)
classifier("Hugging Face is a French company based in New York City.")
[{'entity_group': 'ORG',
'score': 0.96746373,
'word': 'Hugging Face',
'start': 0,
'end': 12},
{'entity_group': 'MISC',
'score': 0.99828726,
'word': 'French',
'start': 18,
'end': 24},
{'entity_group': 'LOC',
'score': 0.99896103,
'word': 'New York City',
'start': 42,
'end': 55}]
Question Answering
from transformers import pipeline
question_answerer = pipeline(task="question-answering")
No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
preds = question_answerer(
question="What is the name of the repository?",
context="The name of the repository is huggingface/transformers",
)
print(
f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)
输出:
score: 0.9327, start: 30, end: 54, answer: huggingface/transformers
preds = question_answerer(
question="What is the capital of China?",
context="On 1 October 1949, CCP Chairman Mao Zedong formally proclaimed the People's Republic of China in Tiananmen Square, Beijing.",
)
print(
f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)
输出:
score: 0.9458, start: 115, end: 122, answer: Beijing
pre {
background-color: #f8f8f8;
border: 1px solid #ddd;
font-family: 'Courier New', Courier, monospace;
padding: 10px;
}
标签:end,preds,word,entity,start,score,Transformers,pipline
From: https://www.cnblogs.com/menkeyi/p/18605927