参考文献:保姆级教程,用PyTorch和BERT进行文本分类 - 知乎 (zhihu.com)
模型地址:https://huggingface.co/bert-base-cased
from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer.from_pretrained('bert-base-cased') model = BertModel.from_pretrained("bert-base-cased") text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input)
output:
BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.6023, 0.1092, 0.1417, ..., -0.4177, 0.6059, 0.1764], [ 0.5119, -0.4770, 0.5508, ..., -0.2814, 0.3793, 0.1156], [ 0.0995, 0.0867, 0.0869, ..., 0.4789, -0.3236, 0.3122], ..., [ 0.8081, -0.7380, 0.2001, ..., 0.7405, -0.7998, 0.6449], [ 0.3305, -0.1958, 0.3148, ..., -0.0525, 0.5358, 0.1987], [ 0.5655, -0.2176, -0.4720, ..., -0.3554, 0.6141, -0.2476]]], grad_fn=<NativeLayerNormBackward0>), pooler_output=tensor([[-8.2835e-01, 6.0906e-01, 9.9998e-01, -9.9831e-01, 9.8739e-01, 9.2930e-01, 9.9664e-01, -9.8909e-01, -9.9124e-01, -7.8317e-01, 9.9476e-01, 9.9961e-01, -9.9850e-01, -9.9997e-01, 8.0649e-01, -9.9098e-01, 9.9545e-01, -6.9968e-01, -9.9999e-01, -2.6556e-01, -4.1867e-01, -9.9998e-01, 3.1562e-01, 9.7638e-01, 9.9257e-01, 1.4021e-01, 9.9543e-01, 9.9999e-01, 9.4214e-01, -2.0248e-01, 3.5963e-01, -9.9665e-01, 8.5152e-01, -9.9975e-01, 2.8570e-01, 7.8623e-03, 8.4606e-01, -3.7096e-01, 7.4315e-01, -9.5596e-01, -7.8807e-01, -4.7754e-01, 6.0904e-01, -6.4658e-01, 9.6723e-01,
结果是BaseModelOutput对象:
BaseModelOutput
类型是 Hugging Face Transformers 库中的一个基本输出类型,用于存储预训练模型的输出。该类型通常包含以下属性:
-
last_hidden_state: 这是模型对输入序列的最后一层的隐藏状态。对于BERT模型,这是包含输入序列各个标记的隐藏表示。形状通常是
[batch_size, sequence_length, hidden_size]
,其中batch_size
是输入的批次大小,sequence_length
是输入序列的长度,hidden_size
是隐藏状态的维度。 -
pooler_output: 这是经过池化操作后的表示,通常用于表示整个输入序列的信息。形状通常是
[batch_size, hidden_size]
。 -
hidden_states: 这是模型每一层的隐藏状态的列表。例如,对于BERT-base模型,
hidden_states
包含了所有12层的隐藏状态。形状类似于last_hidden_state
,即[batch_size, sequence_length, hidden_size]
。 -
attentions: 这是模型每一层的注意力权重。形状通常是
[batch_size, num_heads, sequence_length, sequence_length]
。注意,某些模型可能不会在输出中包含注意力权重。