首先导入依赖的torch包。
import torch
我们设:
- seq_len(序列的最大长度):5
- batch_size(批量大小):2
- d_model(每个单词被映射为的向量的维度):10
- heads(多头注意力机制的头数):5
- d_k(每个头的特征数):2
1、输入形状为:[seq_len, batch_size, d_model]
input_tensor = torch.randn(5, 2, 10)
input_tensor
表示输入到模型的张量,输入形状为:[seq_len, batch_size, d_model]。
input_tensor
# 输出结果
'''
tensor([[[-0.0564, -0.4915, 0.1572, 0.1950, -0.1457, 1.5368, 1.1635,
0.6610, -0.6690, -1.2407],
[ 0.2209, -1.2430, 1.3436, -1.6678, -2.6158, 1.3362, -0.3048,
-0.3680, -1.0620, 0.6028]],
[[ 0.5403, 2.8848, -1.1311, 0.6405, -2.8894, 0.0266, 0.3725,
-1.4034, -1.0788, -0.9796],
[ 0.0851, 0.3146, 1.1101, -0.1377, -0.3531, -0.6355, -1.1008,
0.4649, -1.0249, 2.3093]],
[[ 0.9773, 0.0954, -0.9705, 0.3955, -0.8477, -0.5051, 1.5252,
2.4351, -0.3550, -0.7516],
[ 0.8564, 1.3546, -0.0192, -1.3067, 0.2836, -0.2337, -0.9309,
-0.9528, 0.1533, 0.1920]],
[[-0.7944, 0.0292, 0.1796, -0.1784, 0.4151, -1.7918, 2.2592,
-0.3511, -0.6939, -0.7411],
[-1.5070, -0.0961, 0.1144, 0.1852, 0.9209, 0.8497, 0.0394,
-0.3052, -0.8078, -0.9088]],
[[-0.6081, -0.7641, -0.4355, 0.1307, 0.8386, -0.3480, -0.6175,
-1.2444, -0.6881, 0.7320],
[-2.1062, -0.3705, -1.5179, 1.7906, -0.3040, 1.8528, 2.8797,
1.2698, 0.2206, 0.4556]]])
'''
'''
[[-0.0564, -0.4915, 0.1572, 0.1950, -0.1457, 1.5368, 1.1635,
0.6610, -0.6690, -1.2407],
[ 0.2209, -1.2430, 1.3436, -1.6678, -2.6158, 1.3362, -0.3048,
-0.3680, -1.0620, 0.6028]]
每一个向量表示句子中一个词的向量表示,而这两个词是在两个不同句子的相同位置上。为什么是两个句子呢?因为批次大小为2。
像这样每一块的第一个向量就是第一个句子的词向量表示,第二个向量就是第二个句子的词向量表示。随着批次的增加,以此类推。
'''
a1 = input_tensor.view(5,2,5,2)
对输入进行维度变换,以适应模型的输入。(这里比如是多头注意力)
维度要变为:[seq_len, batch_size,heads, d_k],其中$d_model = heads*d_k$。
a1
# 输出结果
'''
tensor([[[[-0.0564, -0.4915],
[ 0.1572, 0.1950],
[-0.1457, 1.5368],
[ 1.1635, 0.6610],
[-0.6690, -1.2407]],
[[ 0.2209, -1.2430],
[ 1.3436, -1.6678],
[-2.6158, 1.3362],
[-0.3048, -0.3680],
[-1.0620, 0.6028]]],
[[[ 0.5403, 2.8848],
[-1.1311, 0.6405],
[-2.8894, 0.0266],
[ 0.3725, -1.4034],
[-1.0788, -0.9796]],
[[ 0.0851, 0.3146],
[ 1.1101, -0.1377],
[-0.3531, -0.6355],
[-1.1008, 0.4649],
[-1.0249, 2.3093]]],
[[[ 0.9773, 0.0954],
[-0.9705, 0.3955],
[-0.8477, -0.5051],
[ 1.5252, 2.4351],
[-0.3550, -0.7516]],
[[ 0.8564, 1.3546],
[-0.0192, -1.3067],
[ 0.2836, -0.2337],
[-0.9309, -0.9528],
[ 0.1533, 0.1920]]],
[[[-0.7944, 0.0292],
[ 0.1796, -0.1784],
[ 0.4151, -1.7918],
[ 2.2592, -0.3511],
[-0.6939, -0.7411]],
[[-1.5070, -0.0961],
[ 0.1144, 0.1852],
[ 0.9209, 0.8497],
[ 0.0394, -0.3052],
[-0.8078, -0.9088]]],
[[[-0.6081, -0.7641],
[-0.4355, 0.1307],
[ 0.8386, -0.3480],
[-0.6175, -1.2444],
[-0.6881, 0.7320]],
[[-2.1062, -0.3705],
[-1.5179, 1.7906],
[-0.3040, 1.8528],
[ 2.8797, 1.2698],
[ 0.2206, 0.4556]]]])
'''
'''
input_tensor:
[[-0.0564, -0.4915, 0.1572, 0.1950, -0.1457, 1.5368, 1.1635,
0.6610, -0.6690, -1.2407],
[ 0.2209, -1.2430, 1.3436, -1.6678, -2.6158, 1.3362, -0.3048,
-0.3680, -1.0620, 0.6028]]
a1:
[[[-0.0564, -0.4915],
[ 0.1572, 0.1950],
[-0.1457, 1.5368],
[ 1.1635, 0.6610],
[-0.6690, -1.2407]],
[[ 0.2209, -1.2430],
[ 1.3436, -1.6678],
[-2.6158, 1.3362],
[-0.3048, -0.3680],
[-1.0620, 0.6028]]]
我们可以从input_tensor和a1的数据进行对比。我们可以看出,对于input_tensor的一个词向量中,在a1中把这个词向量分为了五行两列的向量矩阵。也就是说我们把句子中的一个词的特征分为五份,每一份两个特征。对应在多头注意力机制上就是:我们有五个头,每个头处理两个特征。
'''
2、输入形状为:[batch_size, seq_len,d_model]
output_tensor = input_tensor.transpose(0, 1)
output_tensor
表示输入到模型的张量,输入形状为:[batch_size, seq_len,d_model]。
output_tensor
# 输出结果
'''
tensor([[[-0.0564, -0.4915, 0.1572, 0.1950, -0.1457, 1.5368, 1.1635,
0.6610, -0.6690, -1.2407],
[ 0.5403, 2.8848, -1.1311, 0.6405, -2.8894, 0.0266, 0.3725,
-1.4034, -1.0788, -0.9796],
[ 0.9773, 0.0954, -0.9705, 0.3955, -0.8477, -0.5051, 1.5252,
2.4351, -0.3550, -0.7516],
[-0.7944, 0.0292, 0.1796, -0.1784, 0.4151, -1.7918, 2.2592,
-0.3511, -0.6939, -0.7411],
[-0.6081, -0.7641, -0.4355, 0.1307, 0.8386, -0.3480, -0.6175,
-1.2444, -0.6881, 0.7320]],
[[ 0.2209, -1.2430, 1.3436, -1.6678, -2.6158, 1.3362, -0.3048,
-0.3680, -1.0620, 0.6028],
[ 0.0851, 0.3146, 1.1101, -0.1377, -0.3531, -0.6355, -1.1008,
0.4649, -1.0249, 2.3093],
[ 0.8564, 1.3546, -0.0192, -1.3067, 0.2836, -0.2337, -0.9309,
-0.9528, 0.1533, 0.1920],
[-1.5070, -0.0961, 0.1144, 0.1852, 0.9209, 0.8497, 0.0394,
-0.3052, -0.8078, -0.9088],
[-2.1062, -0.3705, -1.5179, 1.7906, -0.3040, 1.8528, 2.8797,
1.2698, 0.2206, 0.4556]]])
'''
'''
[[-0.0564, -0.4915, 0.1572, 0.1950, -0.1457, 1.5368, 1.1635,
0.6610, -0.6690, -1.2407],
[ 0.5403, 2.8848, -1.1311, 0.6405, -2.8894, 0.0266, 0.3725,
-1.4034, -1.0788, -0.9796],
[ 0.9773, 0.0954, -0.9705, 0.3955, -0.8477, -0.5051, 1.5252,
2.4351, -0.3550, -0.7516],
[-0.7944, 0.0292, 0.1796, -0.1784, 0.4151, -1.7918, 2.2592,
-0.3511, -0.6939, -0.7411],
[-0.6081, -0.7641, -0.4355, 0.1307, 0.8386, -0.3480, -0.6175,
-1.2444, -0.6881, 0.7320]]
在这一块中,每一个向量是一个句子中的一个词向量,这五个词向量都是一个句子的。由于有两块这样的向量,说明有两个句子,每个句子五个词。
第一块的五个向量是第一个句子的所有词向量,第二块的五个向量是第二个句子的所有词向量。随着批次的增加,以此类推。
'''
a2 = output_tensor.view(2,5,5,2)
对输入进行维度变换,以适应模型的输入。(这里比如是多头注意力)
维度要变为:[batch_size,seq_len,heads, d_k],其中$d_model = heads*d_k$。
a2
# 输出结果
'''
tensor([[[[-0.0564, -0.4915],
[ 0.1572, 0.1950],
[-0.1457, 1.5368],
[ 1.1635, 0.6610],
[-0.6690, -1.2407]],
[[ 0.5403, 2.8848],
[-1.1311, 0.6405],
[-2.8894, 0.0266],
[ 0.3725, -1.4034],
[-1.0788, -0.9796]],
[[ 0.9773, 0.0954],
[-0.9705, 0.3955],
[-0.8477, -0.5051],
[ 1.5252, 2.4351],
[-0.3550, -0.7516]],
[[-0.7944, 0.0292],
[ 0.1796, -0.1784],
[ 0.4151, -1.7918],
[ 2.2592, -0.3511],
[-0.6939, -0.7411]],
[[-0.6081, -0.7641],
[-0.4355, 0.1307],
[ 0.8386, -0.3480],
[-0.6175, -1.2444],
[-0.6881, 0.7320]]],
[[[ 0.2209, -1.2430],
[ 1.3436, -1.6678],
[-2.6158, 1.3362],
[-0.3048, -0.3680],
[-1.0620, 0.6028]],
[[ 0.0851, 0.3146],
[ 1.1101, -0.1377],
[-0.3531, -0.6355],
[-1.1008, 0.4649],
[-1.0249, 2.3093]],
[[ 0.8564, 1.3546],
[-0.0192, -1.3067],
[ 0.2836, -0.2337],
[-0.9309, -0.9528],
[ 0.1533, 0.1920]],
[[-1.5070, -0.0961],
[ 0.1144, 0.1852],
[ 0.9209, 0.8497],
[ 0.0394, -0.3052],
[-0.8078, -0.9088]],
[[-2.1062, -0.3705],
[-1.5179, 1.7906],
[-0.3040, 1.8528],
[ 2.8797, 1.2698],
[ 0.2206, 0.4556]]]])
'''
'''
output_tensor:
[[-0.0564, -0.4915, 0.1572, 0.1950, -0.1457, 1.5368, 1.1635,
0.6610, -0.6690, -1.2407],
[ 0.5403, 2.8848, -1.1311, 0.6405, -2.8894, 0.0266, 0.3725,
-1.4034, -1.0788, -0.9796],
[ 0.9773, 0.0954, -0.9705, 0.3955, -0.8477, -0.5051, 1.5252,
2.4351, -0.3550, -0.7516],
[-0.7944, 0.0292, 0.1796, -0.1784, 0.4151, -1.7918, 2.2592,
-0.3511, -0.6939, -0.7411],
[-0.6081, -0.7641, -0.4355, 0.1307, 0.8386, -0.3480, -0.6175,
-1.2444, -0.6881, 0.7320]]
a2:
[[[-0.0564, -0.4915],
[ 0.1572, 0.1950],
[-0.1457, 1.5368],
[ 1.1635, 0.6610],
[-0.6690, -1.2407]],
[[ 0.5403, 2.8848],
[-1.1311, 0.6405],
[-2.8894, 0.0266],
[ 0.3725, -1.4034],
[-1.0788, -0.9796]],
[[ 0.9773, 0.0954],
[-0.9705, 0.3955],
[-0.8477, -0.5051],
[ 1.5252, 2.4351],
[-0.3550, -0.7516]],
[[-0.7944, 0.0292],
[ 0.1796, -0.1784],
[ 0.4151, -1.7918],
[ 2.2592, -0.3511],
[-0.6939, -0.7411]],
[[-0.6081, -0.7641],
[-0.4355, 0.1307],
[ 0.8386, -0.3480],
[-0.6175, -1.2444],
[-0.6881, 0.7320]]],
我们可以从output_tensor和a2的数据进行对比。我们可以看出,对于output_tensor的一块中(一个句子中)的五个词向量,在a2中把这五个词向量分为了五行两列的向量矩阵(共五个)。也就是说我们把句子中的每个词的特征分为五份,每一份两个特征。对应在多头注意力机制上就是:我们有五个头,每个头处理两个特征。
'''
到此,我们已经从例子中理解了两种输入的区别,也理解了它们的含义。需要注意的是,这两种输入形状是等价的,它们之间的不同仅仅是维度顺序的不同。在实际应用中,我们可以根据具体的神经网络模型的要求来选择输入张量的形状。
标签:tensor,seq,batch,len,向量,size From: https://www.cnblogs.com/zhangxuegold/p/17505395.html