Attention Is All You Need

* Authors: [[Ashish Vaswani]], [[Noam Shazeer]], [[Niki Parmar]], [[Jakob Uszkoreit]], [[Llion Jones]], [[Aidan N. Gomez]], [[Lukasz Kaiser]], [[Illia Polosukhin]]

DOI: 10.48550/ARXIV.1706.03762

初读印象

comment:: 仅仅利用了注意力机制的Sequence to Sequence的经典模型。

动机

那时的时间序列模型通常使用RNN。
RNN的缺点：时序是一步一步计算的，难以并行。内存开销大。
以前的attention通常用于研究如何将编码器的信息传递给解码器

纯attention的并行度比较高。

方法

Mask Attention

Pasted image 20221019151644 Pastedimage 20221019151825

图中的mask是为了防止t时刻看到t以后的内容，具体做法为：t时刻后的key都换成非常大的负数，softmax后对应的关联度就会变为0，那样就会屏蔽t时刻后的value。

multi-head self-attention mechanism

Pasted image 20221019161953 Pastedimage 20221019162014

使用不同的W将Q、K、V投影到不同的距离空间中。

position encoding

attention没有时序信息，打乱顺序对亲和力矩阵的计算没有影响。
通过不同周期的一个余弦和正弦函数，为不同位置的向量做编码，然后直接将位置编码加到目标向量上。
Pasted image 20221019165338 ####整体架构

Pasted image 20221019165448

标签：attention,RNN,Sequence,Attention,时序,Need
From： https://www.cnblogs.com/tifuhong/p/17909227.html

Expectation-Maximization Attention Networks for Semantic Segmentation 使用了EM算
Expectation-MaximizationAttentionNetworksforSemanticSegmentation*Authors:[[XiaLi]],[[ZhishengZhong]],[[JianlongWu]],[[YiboYang]],[[ZhouchenLin]],[[HongLiu]]DOI:10.1109/ICCV.2019.00926Locallibrary初读印象comment::(EMANet)用期望......
A Deformable Attention Network for High-Resolution Remote Sensing Images Semanti
ADeformableAttentionNetworkforHigh-ResolutionRemoteSensingImagesSemanticSegmentation*Authors:[[RenxiangZuo]],[[GuangyunZhang]],[[RongtingZhang]],[[XiupingJia]]DOI:10.1109/TGRS.2021.3119537初读印象comment::（MDANet）提出了可变形注意力，结......
Object Tracking Network Based on Deformable Attention Mechanism
ObjectTrackingNetworkBasedonDeformableAttentionMechanismLocallibrary初读印象comment::（DeTrack）采用基于可变形注意力机制的编码器模块和基于自注意力机制的编码器模块相结合的方式进行特征交互。基于可变形注意力机制的编码器可以在不聚焦所有像素的情况下精......
BiFormer: Vision Transformer with Bi-Level Routing Attention 使用超标记的轻量ViT
alias:Zhu2023atags:超标记注意力rating:⭐share:falseptype:articleBiFormer:VisionTransformerwithBi-LevelRoutingAttention*Authors:[[LeiZhu]],[[XinjiangWang]],[[ZhanghanKe]],[[WayneZhang]],[[RynsonLau]]Locallibrary初读印象comm......
GCGP：Global Context and Geometric Priors for Effective Non-Local Self-Attention加
GlobalContextandGeometricPriorsforEffectiveNon-LocalSelf-Attention*Authors:[[WooS]]初读印象comment::（GCGP）提出了一个新的关系推理模块，它包含了一个上下文化的对角矩阵和二维相对位置表示。动机普通注意力的缺点：单独处理输入图像中的每个特征，并在整个输......
MetaFormer Is Actually What You Need for Vision：通用的ViT架构才是关键
MetaFormerIsActuallyWhatYouNeedforVision*Authors:[[WeihaoYu]],[[MiLuo]],[[PanZhou]],[[ChenyangSi]],[[YichenZhou]],[[XinchaoWang]],[[JiashiFeng]],[[ShuichengYan]]初读印象comment::（PoolFormer)Transformer的通用架构是其良好性能的保障，而......
Fully Attentional Network for Semantic Segmentation：FLANet
FullyAttentionalNetworkforSemanticSegmentation*Authors:[[QiSong]],[[JieLi]],[[ChenghongLi]],[[HaoGuo]],[[RuiHuang]]初读印象comment::(FLANet)常规的注意力在得到相容性矩阵的时候，把会有一个维度被压缩掉。为了解决这个问题，本文提出了一种新的方法，即......
使用Apache POI 导入导出时出现You need to call a different part of POI to process
问题复现在学习导出功能时使用HSSFWorkbook导出了一个xxx.xlsx格式的文件，然后用XSSFWorkbook的读取方式来拿文件去导入时出现了这个bug这是当时做导出测试代码Workbookwb=newHSSFWorkbook();CreationHelpercreationHelper=wb.getCreationHelper();Sheetsheet=wb.cr......
The Devil Is in the Details: Window-based Attention for Image Compression
目录简介简介基于CNN的模型的一个主要缺点是cNN结构不是为捕捉局部冗余而设计的，尤其是非重复纹理，这严重影响了重建质量。受视觉转换器（ViT）和SwinTransformer最新进展的启发，我们发现将局部感知注意机制与全局相关特征学习相结合可以满足图像压缩的期望。介绍了一种更简单有效......
论文笔记: Attributed Graph Clustering: A Deep Attentional Embedding Approach
论文笔记:AttributedGraphClustering:ADeepAttentionalEmbeddingApproach中文名称:属性图聚类：一种深度注意力嵌入方法论文链接:https://arxiv.org/abs/1906.06532背景: 图聚类是发现网络中的社区或群体的一项基本任务。最近的研究主要集中在开发深度学习方......

Attention Is All You Need

Attention Is All You Need

初读印象

动机

方法

Mask Attention

multi-head self-attention mechanism

position encoding

相关文章

赞助商

阅读排行