【ECCV2022】DaViT: Dual Attention Vision Transformers

时间：2022-11-18 19:44:21浏览次数：88

标签：attention Transformers ECCV2022 Attention global 所示维度论文 channel

请添加图片描述

【ECCV2022】DaViT: Dual Attention Vision Transformers

代码：https://github.com/dingmyu/davit

这个论文想法很自然也容易想到。Transformer都是在处理 PxC 二维的数据，其中 P 是token 的数量，C是特征的维度。普通的方法都是在P这个维度计算attention，那么是不是可以在C这个维度计算attention呢？肯定是可以的。

因此，作者使用了两种 attention，如下图所示，分别是在token维度上进行计算最常规的 windows self-attention，和在 channel 维度上计算的 channel group self-attention。两种attention交替使用。

请添加图片描述

具体来说，作者提出了dual attention block，如下图所示。分别交替计算 spatial window mutlihead attention 和 channel group attention，这样模型可以 capture both local fine-grained and global image-level interactions.

请添加图片描述

论文的核心思想就是这些了，和以往论文不同的是，论文写了一个 Analysis 章节，专门分析模型的特点。如下图所示，论文通过可视化专门分析 channel attention是如何聚集全局信息的。

请添加图片描述

此外，作者还做了可视化实验分析 channel group attention 的有效性。It shows strong global modeling capabilities by finding out fine-grained details of the main content in stage 1, and further focusing on some keypoints in stage 2. It then gradually refines the regions of interest from both global and local perspectives for final recognition.

请添加图片描述

消融实验中有趣的是两个 attention 模块的顺序，如下表所示。We can see that the three strategies achieve similar performance, with 'window attention first' slightly better

请添加图片描述

标签：attention,Transformers,ECCV2022,Attention,global,所示,维度,论文,channel
From： https://www.cnblogs.com/gaopursuit/p/16904729.html

【ARXIV2207】LightViT: Towards Light-Weight Convolution-Free Vision Transformers
【ARXIV2207】LightViT:TowardsLight-WeightConvolution-FreeVisionTransformers论文地址：https://arxiv.org/abs/2207.05557代码地址：https://github.com/hunto/Li......
一文读懂：注意力机制（Attention Mechanism）
注意力机制注意力机制(AttentionMechanism)浅谈1.注意力机制的由来，解决了什么问题？早期在解决机器翻译这一类序列到序列(SequencetoSequence)的问题时，通常采用的做法......
【ECCV2022】AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transf
1、Motivation这个论文来自于清华大学鲁继文老师团队，核心是attention和MLP-mixer思想的结合。建议用2分钟时间学习一下谷歌公司的MLP-Mixer「MLP-Mixer:Anall-ML......
Transformer/BERT/Attention面试问题与答案
from: https://blog.csdn.net/weixin_40633696/article/details/121810403文章目录1.Self-Attention的核心是什么？2.不考虑多头的原因，self-attention中词向量不乘QKV参......
『NLP学习笔记』如何理解attention中的Q,K,V
如何理解attention中的Q,K,V？文章目录一.如何理解attention中的Q,K,V？1.1.定义三个线性变换矩阵1.2.定义QKV1.3.自注意力计算1.3.1......
Attention is all you need-论文阅读笔记
摘要主流的序列转换模型都是基于复杂的循环或者卷积神经网络，需要进行解码器和编码器处理。表现最好的模型也是基于注意力机制，并连接编码器和解码器。本文提出一个新的简单......
深度学习论文: MOAT: Alternating Mobile Convolution and Attention Brings Strong V
深度学习论文:MOAT:AlternatingMobileConvolutionandAttentionBringsStrongVisionModels及其PyTorch实现MOAT:AlternatingMobileConvolutionandAttentionB......
YeYuan-2021-AgentFormer-AgentAwareTransformers for Socio-Tempolar Multi-Agent Fo
#AgentFormer:Agent-AwareTransformersforSocio-TemporalMulti-AgentForecasting#paper1.paper-info1.1MetadataAuthor::[[YeYuan]],[[XinshuoWeng]],......
ConvFormer: Closing the Gap Between CNN and Vision Transformers概述
0.前言相关资料：arxivgithub论文解读论文基本信息：发表时间：arxiv2022(2022.9.16)1.针对的问题CNN虽然效率更高，能够建模局部关系，易于训练，收......
Transformers Pipelines
pipelines是使用模型进行推理的一种很好且简单的方法。这些pipelines是从库中抽象出大部分复杂代码的对象，提供了一个简单的API，专门用于多个任务，包括命名实体识别、屏蔽语......

【ECCV2022】DaViT: Dual Attention Vision Transformers

相关文章

赞助商

阅读排行