CVPR2023论文速览Transformer

标签：Transformer,Transformers,ViTs,tokens,Vision,ViT,速览,CVPR2023
From： https://blog.csdn.net/weixin_44287798/article/details/139878976

CVPR2023论文速览Transformer

Paper1 TrojViT: Trojan Insertion in Vision Transformers

Paper2 X-Pruner: eXplainable Pruning for Vision Transformers

Paper3 ViTs for SITS: Vision Transformers for Satellite Image Time Series

Paper4 NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

Paper5 Dual-Path Adaptation From Image to Video Transformers

Paper6 SViTT: Temporal Learning of Sparse Video-Text Transformers

Paper7 An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling

Paper8 Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention

Paper9 Learning Imbalanced Data With Vision Transformers

Paper10 Mask3D: Pre-Training 2D Vision Transformers by Learning Masked 3D Priors

Paper11 Vision Transformers Are Good Mask Auto-Labelers

Paper12 Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers

Paper13 Visual Atoms: Pre-Training Vision Transformers With Sinusoidal Waves

Paper14 Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution

Paper15 Feature Shrinkage Pyramid for Camouflaged Object Detection With Transformers

Paper16 Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions

Paper17 RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving

Paper18 Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring

Paper19 Towards End-to-End Generative Modeling of Long Videos With Memory-Efficient Bidirectional Transformers

Paper20 MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

Paper21 Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization

Paper22 Making Vision Transformers Efficient From a Token Sparsification View

Paper23 TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

Paper24 Recurrent Vision Transformers for Object Detection With Event Cameras

Paper25 RGB No More: Minimally-Decoded JPEG Vision Transformers

Paper26 Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Paper27 Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers

Paper28 IS-GGT: Iterative Scene Graph Generation With Generative Transformers

Paper29 Devil Is in the Queries: Advancing Mask Transformers for Real-World Medical Image Segmentation and Out-of-Distribution Localization

Paper30 PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation With Progressive Video Transformers

Paper31 A Light Touch Approach to Teaching Transformers Multi-View Geometry

Paper32 Trade-Off Between Robustness and Accuracy of Vision Transformers

Paper33 Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

Paper34 Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

Paper35 Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers

Paper36 AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers

Paper37 Learning Expressive Prompting With Residuals for Vision Transformers

Paper38 Supervised Masked Knowledge Distillation for Few-Shot Transformers

Paper39 DeepVecFont-v2: Exploiting Transformers To Synthesize Vector Fonts With Higher Quality

Paper40 Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Paper41 Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

Paper42 You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

Paper43 PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers

Paper44 Vision Transformers Are Parameter-Efficient Audio-Visual Learners

Paper45 Efficient Movie Scene Detection Using State-Space Transformers

Paper46 BAEFormer: Bi-Directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation

Paper47 N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

Paper48 Semi-DETR: Semi-Supervised Object Detection With Detection Transformers

Paper49 CompletionFormer: Depth Completion With Convolutions and Vision Transformers

Paper50 SVGformer: Representation Learning for Continuous Vector Graphics Using Transformers