最近需要对position_ids和attention_mask进行重构,所以需要掌握numpy的一些操作,以下是一些示例,
多个下三角矩阵拼接:
import numpy as np
from scipy.linalg import block_diag
A = np.ones((2,2))
B = np.ones((3,3))
b = [A,B]
print(np.tril(block_diag(*b)))
[[1. 0. 0. 0. 0.]
[1. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 1. 1. 0.]
[0. 0. 1. 1. 1.]]
二维的position_ids的padding:
encoded_inputs ={}
encoded_inputs["position_ids"] =np.array([[1,2,3,4],[4,5,6,7]])
difference = 4
encoded_inputs["position_ids"] = np.pad(
encoded_inputs["position_ids"], pad_width=[(0, 0), (difference, 0)]
)
print(encoded_inputs)
{'position_ids': array([[0, 0, 0, 0, 1, 2, 3, 4],
[0, 0, 0, 0, 4, 5, 6, 7]])}
attention_mask的拼接:
encoded_inputs["attention_mask"] = np.zeros((1,3,3))
encoded_inputs["attention_mask"] = np.pad(
encoded_inputs["attention_mask"],
pad_width=[(0, 0), (difference, 0), (difference, 0)],
mode="constant",
constant_values=1,
)
print(encoded_inputs)
{'position_ids': array([[0, 0, 0, 0, 1, 2, 3, 4],
[0, 0, 0, 0, 4, 5, 6, 7]]), 'attention_mask': array([[[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 0., 0., 0.]]])}
list里面多个list进行拼接:
inputs =[[1,2],[0,1]]
# inputs =[[[1,2],[0,1]],[[1,2,3,4],[4,5,6,7]]]
# out = sum(inputs,[[]])
out = np.concatenate(inputs, axis=-1)
print(out)
[1 2 0 1]
chatglm里面的attention_mask的创建:
seq_length = 4
context_length=2
attention_mask = np.ones((seq_length, seq_length))
attention_mask = np.tril(attention_mask)
attention_mask[:, :context_length] = 1
attention_mask = (attention_mask < 0.5).astype("int64")
print(attention_mask)
[[0 0 1 1]
[0 0 1 1]
[0 0 0 1]
[0 0 0 0]]
我发现LLM的输入里面attention_mask,position_ids的构造会不一样,其他的都还好,所以这里分享出来,与大家共同进步。