2023CVPR_Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring

时间：2023-11-07 16:44:05浏览次数：33

标签：dim Domain Transformers Efficient patch1 self fft patch size

一. Motivation

1. Transformer在解决全局表现很好，但是复杂度很高，主要体现在QK的乘积: (We note that the scaled dot-product attention computation is actually to estimate the correlation of one token from the query and all the tokens from the key)

在self-attention中：

二. Contribution

1. 使用逐点乘法操作来估计矩阵惩罚，基于频域的方法，用于高效计算自注意力，从而降低了计算的复杂性

2.简单使用FFN不能产生很好的结果，所以设计了一个基于鉴别频域的DFFN模块，在FFN中引入门控机制，以区分地确定应该保留哪些低频和高频信息以进行图像恢复

3. Network

1.FSAS

class FSAS(nn.Module):
    def __init__(self, dim, bias):
        super(FSAS, self).__init__()

        self.to_hidden = nn.Conv2d(dim, dim * 6, kernel_size=1, bias=bias)
        self.to_hidden_dw = nn.Conv2d(dim * 6, dim * 6, kernel_size=3, stride=1, padding=1, groups=dim * 6, bias=bias)

        self.project_out = nn.Conv2d(dim * 2, dim, kernel_size=1, bias=bias)

        self.norm = LayerNorm(dim * 2, LayerNorm_type='WithBias')

        self.patch_size = 8

    def forward(self, x):
        hidden = self.to_hidden(x)

        q, k, v = self.to_hidden_dw(hidden).chunk(3, dim=1)

        q_patch = rearrange(q, 'b c (h patch1) (w patch2) -> b c h w patch1 patch2', patch1=self.patch_size, patch2=self.patch_size)
        k_patch = rearrange(k, 'b c (h patch1) (w patch2) -> b c h w patch1 patch2', patch1=self.patch_size, patch2=self.patch_size)
        q_fft = torch.fft.rfft2(q_patch.float())
        k_fft = torch.fft.rfft2(k_patch.float())

        out = q_fft * k_fft
        out = torch.fft.irfft2(out, s=(self.patch_size, self.patch_size))
        out = rearrange(out, 'b c h w patch1 patch2 -> b c (h patch1) (w patch2)', patch1=self.patch_size,
                        patch2=self.patch_size)

        out = self.norm(out)

        output = v * out
        output = self.project_out(output)

        return output

2. DFFN

class DFFN(nn.Module):
    def __init__(self, dim, ffn_expansion_factor, bias):

        super(DFFN, self).__init__()

        hidden_features = int(dim * ffn_expansion_factor)

        self.patch_size = 8

        self.dim = dim
        self.project_in = nn.Conv2d(dim, hidden_features * 2, kernel_size=1, bias=bias)

        self.dwconv = nn.Conv2d(hidden_features * 2, hidden_features * 2, kernel_size=3, stride=1, padding=1, groups=hidden_features * 2, bias=bias)

        self.fft = nn.Parameter(torch.ones((hidden_features * 2, 1, 1, self.patch_size, self.patch_size // 2 + 1)))
        self.project_out = nn.Conv2d(hidden_features, dim, kernel_size=1, bias=bias)

    def forward(self, x):
        x = self.project_in(x)
        x_patch = rearrange(x, 'b c (h patch1) (w patch2) -> b c h w patch1 patch2', patch1=self.patch_size, patch2=self.patch_size)
        x_patch_fft = torch.fft.rfft2(x_patch.float())
        x_patch_fft = x_patch_fft * self.fft
        x_patch = torch.fft.irfft2(x_patch_fft, s=(self.patch_size, self.patch_size))
        x = rearrange(x_patch, 'b c h w patch1 patch2 -> b c (h patch1) (w patch2)', patch1=self.patch_size, patch2=self.patch_size)
        x1, x2 = self.dwconv(x).chunk(2, dim=1)

        x = F.gelu(x1) * x2
        x = self.project_out(x)
        return x

消融实验：

标签：dim,Domain,Transformers,Efficient,patch1,self,fft,patch,size
From： https://www.cnblogs.com/yyhappy/p/17815078.html

轻松理解 Transformers (3): Feed-Forward Layer部分
编者按：随着人工智能技术的不断发展Transformer架构已经成为了当今最为热门的话题之一。前馈层作为Transformer架构中的重要组成部分，其作用和特点备受关注。本文通过浅显易懂的语言和生活中的例子，帮助读者逐步理解Transformers中的前馈层。本文是Transformers系列的第三篇。作者的观......
《AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE》阅
论文标题《ANIMAGEISWORTH16X16WORDS:TRANSFORMERSFORIMAGERECOGNITIONATSCALE》谷歌论文起名越来越写意了，“一幅图像值16X16个单词”是什么玩意儿。ATSCALE：说明适合大规模的图片识别，也许小规模的不好使作者来自GoogleResearch的Brain团队，经典的同等贡献......
Checkerboard Context Model for Efficient Learned Image Compression
目录AbstractIntroductionPreliminary初步介绍VariationalImageCompressionwithHyperprior(超先验变分图像压缩)AutoregressiveContext(自回归上下文模型)ParallelContextModeling并行上下文模型Random-MaskModel:TestArbitraryMasks(随机掩码模型)HowDistanceInfl......
ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Co
abstruct\(\quad\)受能量压缩表现的启发，提出了不均匀通道情况自适应编码.结合不均匀分组模型和现有上下文模型，获得一种空间通道上下文自适应模型，来提高编码性能，而不影响其运行时间。\(\quad\)这种模型支持预览解码和渐进解码。introduction学习图像压缩中最重要的技术联合前......
【CVPR2023】Efficient and Explicit Modelling of Image Hierarchies for Image Rest
>论文：https://readpaper.com/paper/4728855966703960065代码：https://github.com/ofsoundof/GRL-Image-Restoration这个论文的代码地址叫GRL，意思是Global,Regional,Local的意思，作者从三个尺度对特征建模，核心是构建了一个anchoredstripself-attention。如何从Global,R......
轻松理解 Transformers（2）：Attention部分
编者按：随着人工智能技术的不断发展，Transformers模型架构已成为自然语言处理领域的重要基石。然而，许多人对其内部工作机制仍然感到困惑。本文通过浅显易懂的语言和生活中的例子，帮助读者逐步理解Transformers中最核心的Attention机制。本文是Transformers系列的第二篇。作者的核......
使用 AppDomain.CurrentDomain.GetAssemblies() 始终读取不到某一个程序集
AppDomain.CurrentDomain.GetAssemblies() 只会获取到已加载到当前域的程序集。可以先将所有程序集加载之后再进行读取：DependencyContext.Default.RuntimeLibraries.Where(o=>o.Name.StartsWith("Yuji.")).Select(o=>Assembly.Load(newAssemblyName(o.Name))).ToArray()......
2023ACMMM_Mutual Information-driven Triple Interaction Network for Efficient Ima
一.Motivation之前网络存在的缺点：1.使用的有限的频域信息 2. 不充足的信息交互：(1)第一阶段的输出直接作为第二阶段的输入，忽略了中间特征从早期到后期的传播(2)在编码器解码器结构同尺度之间进行特征融合，忽略了阶段内和跨阶段的跨尺度信息交换3. 严重的特征......
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text
目录概主要内容代码SunH.,DhingraB.,ZaheerM.,MazaitisK.,SalakhutdinovR.andCohenW.W.Opendomainquestionansweringusingearlyfusionofknowledgebasesandtext.EMNLP,2018.概KnowledgeBases+Text的推理.主要内容假设我们有一个不完全的知......
论文阅读：Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point M
Point-BERT:Pre-training3DPointCloudTransformerswith MaskedPointModeling摘要我们提出了Point-BERT，一个学习注意力的新范式，将BERT[8]的概念推广到三维点云。受BERT的启发，我们设计了一个掩蔽点建模（MPM）任务来预先训练点云注意力。具体来说，我们首先将点云划分为几个局部的......

2023CVPR_Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring

相关文章

赞助商

阅读排行