公式 1 不带权重的自注意力机制

\[Attention(X) = softmax(\frac{X\cdot{X^T}}{\sqrt{dim_X}})\cdot X \]

示例程序：

import numpy as np
emb_dim = 3
qkv_dim = 4
seq_len = 5
X = np.array([
    [1501, 502, 503],
    [2502, 501, 503],
    [503, 501, 502],
    [503, 502, 501],
    [501, 503, 5020]
])
X

array([[1501,  502,  503],  
       [2502,  501,  503],  
       [ 503,  501,  502],  
       [ 503,  502,  501],  
       [ 501,  503, 5020]])

def softmax(mtx):
    return np.exp(mtx) / np.sum(np.exp(mtx), axis=-1, keepdims=True)
scores = X.dot(X.T) / np.sqrt(emb_dim)
scores = scores - np.max(scores, axis=-1, keepdims=True)
scores

array([[  -867179.52697255,         0.        ,  -1732629.31253861,
         -1732629.88988887,   -421723.19472849],
       [ -1445685.65140109,         0.        ,  -2887906.62383678,
         -2887907.77853732,  -1578157.51596611],
       [ -1019043.43237911,   -728635.09227619,  -1309448.88573069,
         -1309449.46308095,         0.        ],
       [ -1016436.11856345,   -726028.3558108 ,  -1306841.57191502,
         -1306840.99456476,         0.        ],
       [-12802651.57528769, -12513400.24512423, -13094514.2607187 ,
        -13097122.15188463,         0.        ]])

aw = softmax(scores)
aw

array([[0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1.]])

out = aw.dot(X)
out

array([[2502.,  501.,  503.],
       [2502.,  501.,  503.],
       [ 501.,  503., 5020.],
       [ 501.,  503., 5020.],
       [ 501.,  503., 5020.]])

np.sum(aw)

5.0

公式 2 带权重的自注意力机制

\[Attention(X) = softmax(\frac{(X\cdot w^Q)\cdot{({X\cdot w^K})^T}}{\sqrt{dim^K}})\cdot (X\cdot w^V)\\\,\\ = softmax(\frac{Q\cdot K^T}{\sqrt{dim^K}})\cdot V \]

示例程序：

import numpy as np
emb_dim = 3
qkv_dim = 4
seq_len = 5
wq = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 1, 2, 3]
])
wk = np.array([
    [9, 8, 7, 6],
    [5, 4, 3, 2],
    [1, 9, 8, 7]
])
wv = np.array([
    [3, 6, 9, 7],
    [1, 8, 3, 6],
    [4, 5, 2, 2]
])
X = np.array([
    [1, 2, 3],
    [2, 2, 4],
    [5, 9, 7],
    [6, 6, 6],
    [8, 1, 4]
])
X

array([[1, 2, 3],
       [2, 2, 4],
       [5, 9, 7],
       [6, 6, 6],
       [8, 1, 4]])

def softmax(mtx):
    return np.exp(mtx) / np.sum(np.exp(mtx), axis=-1, keepdims=True)
Q = X.dot(wq)
Q

array([[ 38,  17,  23,  29],
       [ 48,  20,  28,  36],
       [113,  71,  92, 113],
       [ 90,  54,  72,  90],
       [ 49,  26,  39,  52]])

K = X.dot(wk)
K

array([[ 22,  43,  37,  31],
       [ 32,  60,  52,  44],
       [ 97, 139, 118,  97],
       [ 90, 126, 108,  90],
       [ 81, 104,  91,  78]])

V = X.dot(wv)
V

array([[ 17,  37,  21,  25],
       [ 24,  48,  32,  34],
       [ 52, 137,  86, 103],
       [ 48, 114,  84,  90],
       [ 41,  76,  83,  70]])

scores = Q.dot(K.T) / np.sqrt(qkv_dim)
scores

array([[ 1658.5,  2354. ,  5788. ,  5328. ,  4600.5],
       [ 2034. ,  2888. ,  7116. ,  6552. ,  5662. ],
       [ 6223. ,  8816. , 21323.5, 19611. , 16861.5],
       [ 4878. ,  6912. , 16731. , 15390. , 13239. ],
       [ 2625.5,  3722. ,  9006.5,  8289. ,  7139. ]])

scores = scores - np.max(scores, axis=-1, keepdims=True)
scores

array([[ -4129.5,  -3434. ,      0. ,   -460. ,  -1187.5],
       [ -5082. ,  -4228. ,      0. ,   -564. ,  -1454. ],
       [-15100.5, -12507.5,      0. ,  -1712.5,  -4462. ],
       [-11853. ,  -9819. ,      0. ,  -1341. ,  -3492. ],
       [ -6381. ,  -5284.5,      0. ,   -717.5,  -1867.5]])

aw = softmax(scores)
aw

array([[0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        1.67702032e-200, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        1.14264732e-245, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
        2.47576395e-312, 0.00000000e+000]])

out = aw.dot(V)
out

array([[ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.],
       [ 52., 137.,  86., 103.]])

标签：小小,Self,attention,000,0.00000000,np,array,503,501
From： https://www.cnblogs.com/luoyicode/p/17924854.html

AI自监督学习（Self-Supervised Learning，SSL）
AI自监督学习（Self-SupervisedLearning，SSL）是一种机器学习方法，用于训练模型从大量无标签数据中自动学习特征表示。自监督学习与传统监督学习不同之处在于，它不需要人工标注数据，而是使用数据本身作为监督信号来学习有效的特征表示。自监督学习在各种AI任务中具有广泛应用前景，如自然语......
论文阅读-Self-supervised and Interpretable Data Cleaning with Sequence Generativ
1.GARF简介代码地址：https://github.com/PJinfeng/Garf-master基于SeqGAN提出了一种自监督、数据驱动的数据清洗框架——GARF。GARF的数据清洗分为两个步骤：规则生成(RulegenerationwithSeqGAN)：利用SeqGAN学习数据中的关系(datarelationship)。然后利用SeqGAN中......
CCNet: Criss-Cross Attention for Semantic Segmentation
CCNet:Criss-CrossAttentionforSemanticSegmentation*Authors:[[ZilongHuang]],[[XinggangWang]],[[YunchaoWei]],[[LichaoHuang]],[[HumphreyShi]],[[WenyuLiu]],[[ThomasS.Huang]]初读印象comment::(CCNet)每个像素通过一个十字注意力模块捕获十字路......
Is Attention Better Than Matrix Decomposition?
IsAttentionBetterThanMatrixDecomposition?*Authors:[[ZhengyangGeng]],[[Meng-HaoGuo]],[[HongxuChen]],[[XiaLi]],[[KeWei]],[[ZhouchenLin]]Locallibrary初读印象comment::作者提出了一系列Hamburger，这些汉堡包使用MD的优化算法来分解输入表示并重......
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
SegNeXt:RethinkingConvolutionalAttentionDesignforSemanticSegmentation*Authors:[[Meng-HaoGuo]],[[Cheng-ZeLu]],[[QibinHou]],[[ZhengningLiu]],[[Ming-MingCheng]],[[Shi-MinHu]]·······初读印象comment::发现了导致分割模型性能提高的几......
Dual Attention Network for Scene Segmentation：双线并行的注意力
DualAttentionNetworkforSceneSegmentation*Authors:[[JunFu]],[[JingLiu]],[[HaijieTian]],[[YongLi]],[[YongjunBao]],[[ZhiweiFang]],[[HanqingLu]]DOI:10.1109/CVPR.2019.00326初读印象comment::(DANet)提出了一个双注意力网络（空间+通道）来自适应......
CBAM: Convolutional Block Attention Module
CBAM:ConvolutionalBlockAttentionModule*Authors:[[SanghyunWoo]],[[JongchanPark]],[[Joon-YoungLee]],[[InSoKweon]]doi:https://doi.org/10.48550/arXiv.1807.06521初读印象comment::(CBAM)提出了卷积块注意力模块。沿着空间和通道推断注意力特征，然后将......
Attention Is All You Need
AttentionIsAllYouNeed*Authors:[[AshishVaswani]],[[NoamShazeer]],[[NikiParmar]],[[JakobUszkoreit]],[[LlionJones]],[[AidanN.Gomez]],[[LukaszKaiser]],[[IlliaPolosukhin]]DOI:10.48550/ARXIV.1706.03762初读印象comment::仅仅利用了注意力......
Expectation-Maximization Attention Networks for Semantic Segmentation 使用了EM算
Expectation-MaximizationAttentionNetworksforSemanticSegmentation*Authors:[[XiaLi]],[[ZhishengZhong]],[[JianlongWu]],[[YiboYang]],[[ZhouchenLin]],[[HongLiu]]DOI:10.1109/ICCV.2019.00926Locallibrary初读印象comment::(EMANet)用期望......
A Deformable Attention Network for High-Resolution Remote Sensing Images Semanti
ADeformableAttentionNetworkforHigh-ResolutionRemoteSensingImagesSemanticSegmentation*Authors:[[RenxiangZuo]],[[GuangyunZhang]],[[RongtingZhang]],[[XiupingJia]]DOI:10.1109/TGRS.2021.3119537初读印象comment::（MDANet）提出了可变形注意力，结......

Self-attention小小实践

公式 1 不带权重的自注意力机制

公式 2 带权重的自注意力机制

相关文章

赞助商

阅读排行