标签：end preds word entity start score Transformers pipline

HF Transformers Pipelines

Pipelines接口方式

任务名称	参数名称	参数描述
sentiment-analysis	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	framework	选择使用的深度学习框架，`"pt"` 表示 PyTorch，`"tf"` 表示 TensorFlow。
	device	设置使用的设备，`-1` 表示使用 CPU，`0` 表示使用第一个 GPU。
text-generation	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	max_length	生成文本的最大长度。
	do_sample	是否随机采样生成文本。
	top_p	控制生成文本的多样性。
translation_en_to_fr	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	src_lang	输入文本的语言。
	tgt_lang	期望输出的语言。
question-answering	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	context	提供上下文的文本。
	question	需要回答的问题。
text-classification	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	return_all_scores	是否返回所有类别的得分，默认为 `False`。
	topk	返回得分最高的前 `k` 个类别，默认为 `1`。

Pipelines 已支持的完整任务列表：https://huggingface.co/docs/transformers/task_summary

transformers 自定义模型下载的路径

在transformers自定义模型下载的路径方法,调用pipeline会缓存模型.下面配置缓存路径

import os

os.environ['HF_HOME'] = '/mnt/new_volume/hf'
os.environ['HF_HUB_CACHE'] = '/mnt/new_volume/hf/hub'

默认模型对中文理解不好,以下输出可能存在问题

from transformers import pipeline

# 仅指定任务时，使用默认模型（不推荐）
pipe = pipeline("sentiment-analysis")
pipe("哈尔滨好冷")

输出:

    [{'label': 'NEGATIVE', 'score': 0.8832131028175354}]

接口示例

pipe("这道菜味道不错")

输出:

    [{'label': 'NEGATIVE', 'score': 0.8870086669921875}]

# 替换为英文后，文本分类任务的表现立刻改善
pipe("You learn things really quickly. You understand the theory class as soon as it is taught.")

输出:

    [{'label': 'POSITIVE', 'score': 0.9961802959442139}]

pipe("Today haerbin is really cold.")

输出:

[{'label': 'NEGATIVE', 'score': 0.999659538269043}]

批处理调用模型推理

text_list = [
    "哈尔滨冰雪大世界很好玩",
    "I like Harbin",
    "You are very good at playing ball."
]

pipe(text_list)

输出:

    [{'label': 'NEGATIVE', 'score': 0.84312504529953},
     {'label': 'POSITIVE', 'score': 0.6818807125091553},
     {'label': 'POSITIVE', 'score': 0.999847412109375}]

from transformers import pipeline

classifier = pipeline(task="ner")

输出:

    No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
    Using a pipeline without specifying a model name and revision in production is not recommended.

preds = classifier("Hugging Face is a French company based in New York City.")
preds = [
    {
        "entity": pred["entity"],
        "score": round(pred["score"], 4),
        "index": pred["index"],
        "word": pred["word"],
        "start": pred["start"],
        "end": pred["end"],
    }
    for pred in preds
]
print(*preds, sep="\n")

输出:

    {'entity': 'I-ORG', 'score': 0.9968, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}
    {'entity': 'I-ORG', 'score': 0.9293, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}
    {'entity': 'I-ORG', 'score': 0.9763, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}
    {'entity': 'I-MISC', 'score': 0.9983, 'index': 6, 'word': 'French', 'start': 18, 'end': 24}
    {'entity': 'I-LOC', 'score': 0.999, 'index': 10, 'word': 'New', 'start': 42, 'end': 45}
    {'entity': 'I-LOC', 'score': 0.9987, 'index': 11, 'word': 'York', 'start': 46, 'end': 50}
    {'entity': 'I-LOC', 'score': 0.9992, 'index': 12, 'word': 'City', 'start': 51, 'end': 55}

合并实体

classifier = pipeline(task="ner", grouped_entities=True)
classifier("Hugging Face is a French company based in New York City.")

[{'entity_group': 'ORG',
  'score': 0.96746373,
  'word': 'Hugging Face',
  'start': 0,
  'end': 12},
 {'entity_group': 'MISC',
  'score': 0.99828726,
  'word': 'French',
  'start': 18,
  'end': 24},
 {'entity_group': 'LOC',
  'score': 0.99896103,
  'word': 'New York City',
  'start': 42,
  'end': 55}]

Question Answering

from transformers import pipeline

question_answerer = pipeline(task="question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.

preds = question_answerer(
    question="What is the name of the repository?",
    context="The name of the repository is huggingface/transformers",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

输出:

score: 0.9327, start: 30, end: 54, answer: huggingface/transformers

preds = question_answerer(
    question="What is the capital of China?",
    context="On 1 October 1949, CCP Chairman Mao Zedong formally proclaimed the People's Republic of China in Tiananmen Square, Beijing.",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

输出:

score: 0.9458, start: 115, end: 122, answer: Beijing

pre {
    background-color: #f8f8f8;
    border: 1px solid #ddd;
    font-family: 'Courier New', Courier, monospace;
    padding: 10px;
}

标签：end,preds,word,entity,start,score,Transformers,pipline
From： https://www.cnblogs.com/menkeyi/p/18605927

Transformers-pipline

HF Transformers Pipelines

transformers 自定义模型下载的路径

默认模型对中文理解不好,以下输出可能存在问题

接口示例

输出:

输出:

输出:

批处理调用模型推理

输出:

输出:

输出:

合并实体

Question Answering

输出:

输出:

相关文章

赞助商

阅读排行