【NeurIPS2022】ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer

时间：2022-12-03 21:45:43浏览次数：46

标签：Rethinking Transformer Attention NeurIPS2022 Self attention 降维 ScalableViT

这篇论文来自清华大学深圳研究生院和字节跳动。

从Swin开始，attention一般都包括局部 window attention 和全局attention 两个部分。模型的改进一般有两个：局部注意力和全局注意力。这篇论文也是如此，整体框架如下图所示，核心包括：局部注意力 Interactive Window Self-Attention (IWSA) 和全局注意力 Scalable Self-Attention (SSA) 两个部分。

请添加图片描述

Interactive Window Self-Attention (IWSA)

IWSA框架如下图所示，首先在局部窗口中分别计算 attention，然后将得到的结果拼接起来，得到Z。各个windows经过 FC 层得到的 value，合并在一起，使用 depth-wise conv 处理（作者称之为 local interactive module），得到Y。将 Z与Y相加，得到输出。

请添加图片描述

Scalable Self-Attention (SSA)

模块结构如下图所示，以前的PVT，只是对Q和K降维（spatial 降维）。在这里，作者不但对Q和K降维，还对V进行降维（channel降维），以达到 scalable 的目标。

请添加图片描述

关键就是看这个缩放怎么取值了。比较有趣的是，作者说，随着网络加深，图像块减少，冗余程度在减少，因此 \(r_n\) 越往后越大。因为空间信息减少较多，需要通道增加来补偿，因此在 ScalableViT-S 和ScalableViT-B 中设置 \(r_c\ge1\)，在 ScalableViT-L 中设置 \(r_c<1\)。具体如下表所示。

请添加图片描述

标签：Rethinking,Transformer,Attention,NeurIPS2022,Self,attention,降维,ScalableViT
From： https://www.cnblogs.com/gaopursuit/p/16948828.html

【NeurIPS2022】Fast Vision Transformers with HiLo Attention
这个论文的核心贡献是提出了一种新的注意力机制HiLo（High/Lowfrequency）。如下图所示，在上面部分，分配了\(1-\alpha\)比例的head用于提取高频注意力，下面分配了\(\alp......
已解决：一步一步扫清transformers的坑。（1）
1.jupyternotebook下关于transformers报错：无法导入管道pipeline解析：不要用新版本的transformers，一般用3.4.0或者3.0.2，因为结合网上评论和我的试用，这两个没出现啥问题。2.......
【ECCV2022】DaViT: Dual Attention Vision Transformers
【ECCV2022】DaViT:DualAttentionVisionTransformers代码：https://github.com/dingmyu/davit这个论文想法很自然也容易想到。Transformer都是在处理PxC二维的数......
transformer 中 tokenizer 的那些事
我们使用bert的时候经常会用到huggingface中的tokenizers进行文本分词，其中有很多函数，tokenizer.tokenize、tokenizer,convert_tokens_to_ids、tokenizer.encode、tokenize......
record_transformer插件的作用？
在fluentd的配置文件中，有如下的配置： #http://<ip>:9880/myapp.access?json={"event":"data"}<source>@typehttpport9880</source><filtermyapp.**>@ty......
【ARXIV2207】LightViT: Towards Light-Weight Convolution-Free Vision Transformers
【ARXIV2207】LightViT:TowardsLight-WeightConvolution-FreeVisionTransformers论文地址：https://arxiv.org/abs/2207.05557代码地址：https://github.com/hunto/Li......
异构计算与Transformer综述
异构计算与Transformer综述英伟达最强异构平台NVIDIAGraceHopperSuperchip架构是第一个真正的异构加速平台，适用于高性能计算(HPC)和AI工作负载。它利用GPU和CPU......
【ECCV2022】AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transf
1、Motivation这个论文来自于清华大学鲁继文老师团队，核心是attention和MLP-mixer思想的结合。建议用2分钟时间学习一下谷歌公司的MLP-Mixer「MLP-Mixer:Anall-ML......
Transformer/BERT/Attention面试问题与答案
from: https://blog.csdn.net/weixin_40633696/article/details/121810403文章目录1.Self-Attention的核心是什么？2.不考虑多头的原因，self-attention中词向量不乘QKV参......
37、记录使用 Swin Transformer主干网络去实现分类，并转化NCNN、TNN、MNN模型以及部署
基本思想:最近手中有个swimtransformer模型，想移植手机端进行推理一下，随手记录一下遇到的问题涉及简单的转ncnntnnmnn的流程性问题一、首先我fork了大佬的代码https:/......

【NeurIPS2022】ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer

Interactive Window Self-Attention (IWSA)

Scalable Self-Attention (SSA)

相关文章

赞助商

阅读排行