首页 > 其他分享 >Self-attention小小实践

Self-attention小小实践

时间:2023-12-24 21:12:48浏览次数:33  
标签:小小 Self attention 000 0.00000000 np array 503 501

目录

公式 1 不带权重的自注意力机制

\[Attention(X) = softmax(\frac{X\cdot{X^T}}{\sqrt{dim_X}})\cdot X \]

示例程序:

import numpy as np
emb_dim = 3
qkv_dim = 4
seq_len = 5
X = np.array([
    [1501, 502, 503],
    [2502, 501, 503],
    [503, 501, 502],
    [503, 502, 501],
    [501, 503, 5020]
])
X
array([[1501,  502,  503],  
       [2502,  501,  503],  
       [ 503,  501,  502],  
       [ 503,  502,  501],  
       [ 501,  503, 5020]])
def softmax(mtx):
    return np.exp(mtx) / np.sum(np.exp(mtx), axis=-1, keepdims=True)
scores = X.dot(X.T) / np.sqrt(emb_dim)
scores = scores - np.max(scores, axis=-1, keepdims=True)
scores
array([[  -867179.52697255,         0.        ,  -1732629.31253861,
         -1732629.88988887,   -421723.19472849],
       [ -1445685.65140109,         0.        ,  -2887906.62383678,
         -2887907.77853732,  -1578157.51596611],
       [ -1019043.43237911,   -728635.09227619,  -1309448.88573069,
         -1309449.46308095,         0.        ],
       [ -1016436.11856345,   -726028.3558108 ,  -1306841.57191502,
         -1306840.99456476,         0.        ],
       [-12802651.57528769, -12513400.24512423, -13094514.2607187 ,
        -13097122.15188463,         0.        ]])
aw = softmax(scores)
aw
array([[0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1.]])
out = aw.dot(X)
out
array([[2502.,  501.,  503.],
       [2502.,  501.,  503.],
       [ 501.,  503., 5020.],
       [ 501.,  503., 5020.],
       [ 501.,  503., 5020.]])
np.sum(aw)
5.0

公式 2 带权重的自注意力机制

\[Attention(X) = softmax(\frac{(X\cdot w^Q)\cdot{({X\cdot w^K})^T}}{\sqrt{dim^K}})\cdot (X\cdot w^V)\\\,\\ = softmax(\frac{Q\cdot K^T}{\sqrt{dim^K}})\cdot V \]

示例程序:

import numpy as np
emb_dim = 3
qkv_dim = 4
seq_len = 5
wq = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 1, 2, 3]
])
wk = np.array([
    [9, 8, 7, 6],
    [5, 4, 3, 2],
    [1, 9, 8, 7]
])
wv = np.array([
    [3, 6, 9, 7],
    [1, 8, 3, 6],
    [4, 5, 2, 2]
])
X = np.array([
    [1, 2, 3],
    [2, 2, 4],
    [5, 9, 7],
    [6, 6, 6],
    [8, 1, 4]
])
X
array([[1, 2, 3],
       [2, 2, 4],
       [5, 9, 7],
       [6, 6, 6],
       [8, 1, 4]])
def softmax(mtx):
    return np.exp(mtx) / np.sum(np.exp(mtx), axis=-1, keepdims=True)
Q = X.dot(wq)
Q
array([[ 38,  17,  23,  29],
       [ 48,  20,  28,  36],
       [113,  71,  92, 113],
       [ 90,  54,  72,  90],
       [ 49,  26,  39,  52]])
K = X.dot(wk)
K
array([[ 22,  43,  37,  31],
       [ 32,  60,  52,  44],
       [ 97, 139, 118,  97],
       [ 90, 126, 108,  90],
       [ 81, 104,  91,  78]])
V = X.dot(wv)
V
array([[ 17,  37,  21,  25],
       [ 24,  48,  32,  34],
       [ 52, 137,  86, 103],
       [ 48, 114,  84,  90],
       [ 41,  76,  83,  70]])
scores = Q.dot(K.T) / np.sqrt(qkv_dim)
scores
array([[ 1658.5,  2354. ,  5788. ,  5328. ,  4600.5],
       [ 2034. ,  2888. ,  7116. ,  6552. ,  5662. ],
       [ 6223. ,  8816. , 21323.5, 19611. , 16861.5],
       [ 4878. ,  6912. , 16731. , 15390. , 13239. ],
       [ 2625.5,  3722. ,  9006.5,  8289. ,  7139. ]])
scores = scores - np.max(scores, axis=-1, keepdims=True)
scores
array([[ -4129.5,  -3434. ,      0. ,   -460. ,  -1187.5],
       [ -5082. ,  -4228. ,      0. ,   -564. ,  -1454. ],
       [-15100.5, -12507.5,      0. ,  -1712.5,  -4462. ],
       [-11853. ,  -9819. ,      0. ,  -1341. ,  -3492. ],
       [ -6381. ,  -5284.5,      0. ,   -717.5,  -1867.5]])
aw = softmax(scores)
aw
array([[0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        1.67702032e-200, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        1.14264732e-245, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        2.47576395e-312, 0.00000000e+000]])
out = aw.dot(V)
out
array([[ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.]])

标签:小小,Self,attention,000,0.00000000,np,array,503,501
From: https://www.cnblogs.com/luoyicode/p/17924854.html

相关文章

  • AI自监督学习(Self-Supervised Learning,SSL)
    AI自监督学习(Self-SupervisedLearning,SSL)是一种机器学习方法,用于训练模型从大量无标签数据中自动学习特征表示。自监督学习与传统监督学习不同之处在于,它不需要人工标注数据,而是使用数据本身作为监督信号来学习有效的特征表示。自监督学习在各种AI任务中具有广泛应用前景,如自然语......
  • 论文阅读-Self-supervised and Interpretable Data Cleaning with Sequence Generativ
    1.GARF简介代码地址:https://github.com/PJinfeng/Garf-master基于SeqGAN提出了一种自监督、数据驱动的数据清洗框架——GARF。GARF的数据清洗分为两个步骤:规则生成(RulegenerationwithSeqGAN):利用SeqGAN学习数据中的关系(datarelationship)。然后利用SeqGAN中......
  • CCNet: Criss-Cross Attention for Semantic Segmentation
    CCNet:Criss-CrossAttentionforSemanticSegmentation*Authors:[[ZilongHuang]],[[XinggangWang]],[[YunchaoWei]],[[LichaoHuang]],[[HumphreyShi]],[[WenyuLiu]],[[ThomasS.Huang]]初读印象comment::(CCNet)每个像素通过一个十字注意力模块捕获十字路......
  • Is Attention Better Than Matrix Decomposition?
    IsAttentionBetterThanMatrixDecomposition?*Authors:[[ZhengyangGeng]],[[Meng-HaoGuo]],[[HongxuChen]],[[XiaLi]],[[KeWei]],[[ZhouchenLin]]Locallibrary初读印象comment::作者提出了一系列Hamburger,这些汉堡包使用MD的优化算法来分解输入表示并重......
  • SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
    SegNeXt:RethinkingConvolutionalAttentionDesignforSemanticSegmentation*Authors:[[Meng-HaoGuo]],[[Cheng-ZeLu]],[[QibinHou]],[[ZhengningLiu]],[[Ming-MingCheng]],[[Shi-MinHu]]·······初读印象comment::发现了导致分割模型性能提高的几......
  • Dual Attention Network for Scene Segmentation:双线并行的注意力
    DualAttentionNetworkforSceneSegmentation*Authors:[[JunFu]],[[JingLiu]],[[HaijieTian]],[[YongLi]],[[YongjunBao]],[[ZhiweiFang]],[[HanqingLu]]DOI:10.1109/CVPR.2019.00326初读印象comment::(DANet)提出了一个双注意力网络(空间+通道)来自适应......
  • CBAM: Convolutional Block Attention Module
    CBAM:ConvolutionalBlockAttentionModule*Authors:[[SanghyunWoo]],[[JongchanPark]],[[Joon-YoungLee]],[[InSoKweon]]doi:https://doi.org/10.48550/arXiv.1807.06521初读印象comment::(CBAM)提出了卷积块注意力模块。沿着空间和通道推断注意力特征,然后将......
  • Attention Is All You Need
    AttentionIsAllYouNeed*Authors:[[AshishVaswani]],[[NoamShazeer]],[[NikiParmar]],[[JakobUszkoreit]],[[LlionJones]],[[AidanN.Gomez]],[[LukaszKaiser]],[[IlliaPolosukhin]]DOI:10.48550/ARXIV.1706.03762初读印象comment::仅仅利用了注意力......
  • Expectation-Maximization Attention Networks for Semantic Segmentation 使用了EM算
    Expectation-MaximizationAttentionNetworksforSemanticSegmentation*Authors:[[XiaLi]],[[ZhishengZhong]],[[JianlongWu]],[[YiboYang]],[[ZhouchenLin]],[[HongLiu]]DOI:10.1109/ICCV.2019.00926Locallibrary初读印象comment::(EMANet)用期望......
  • A Deformable Attention Network for High-Resolution Remote Sensing Images Semanti
    ADeformableAttentionNetworkforHigh-ResolutionRemoteSensingImagesSemanticSegmentation*Authors:[[RenxiangZuo]],[[GuangyunZhang]],[[RongtingZhang]],[[XiupingJia]]DOI:10.1109/TGRS.2021.3119537初读印象comment::(MDANet)提出了可变形注意力,结......