首页 > 其他分享 >3D Object Detection Essay Reading 2024.04.05

3D Object Detection Essay Reading 2024.04.05

时间:2024-04-05 17:13:08浏览次数:22  
标签:Essay 2024.04 nn img 05 self torch channels intrinsic

EMIFF

  1. 论文:https://arxiv.org/abs/2303.10975

  2. 代码:https://github.com/Bosszhe/EMIFF

image

​ 本文提出了一种新的基于摄像机的三维检测框架,增强型多尺度图像特征融合(EMIFF)。虽然EMIFF的输入是2D图像,但是它的neck层的结构设计应该普适于点云的3D目标检测,同时其中的MFC等模块可以简单地被替换成更先进的其他组件。

​ 为了充分利用车辆和基础设施的整体视角,本文提出了多尺度交叉注意MCA(包含了MFC和MFS)和相机感知通道掩蔽CCM模块,以在尺度、空间和通道(MFC尺度级增强、MFS空间级增强、CCM通道级增强)级别增强基础设施和车辆特征,从而纠正相机异步引入的姿态误差。我们还引入了一个特征压缩FC模块,该模块具有信道和空间压缩块,以提高传输效率。

MFC

image

​ MFC模块首先应用于多尺度特征。由于姿态误差会导致2D平面上投影位置和地面真实位置之间的位移,我们对每个比例特征应用DCN,以允许每个像素获得其周围的空间信息。然后,通过UpConv块将不同尺度的特征上采样到相同的尺寸。

class double_conv(nn.Module):

    def __init__(self, in_ch, out_ch):
        super(double_conv, self).__init__()

        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, stride=1, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(),
            nn.Conv2d(out_ch, out_ch, 3, stride=1, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.conv(x)
        return x
    
class DCN_Up_Conv_List(nn.Module):

    def __init__(self, neck_dcn, channels):
        super(DCN_Up_Conv_List, self).__init__()


        self.upconv0 = nn.Sequential(
            double_conv(channels,channels),
        )

        self.upconv1 = nn.Sequential(
            nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
            double_conv(channels,channels),
        )
        self.upconv2 = nn.Sequential(
            nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
            double_conv(channels,channels),
            nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
            double_conv(channels,channels),
        )
        self.upconv3 = nn.Sequential(
            nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
            double_conv(channels,channels),
            nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
            double_conv(channels,channels),
            nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
            double_conv(channels,channels),
        )

        self.dcn0 = build_neck(neck_dcn)
        self.dcn1 = build_neck(neck_dcn)
        self.dcn2 = build_neck(neck_dcn)
        self.dcn3 = build_neck(neck_dcn)

    def forward(self, x):
        assert x.__len__() == 4
        x0 = self.dcn0(x[0])
        x0 = self.upconv0(x0)

        x1 = self.dcn1(x[1])
        x1 = self.upconv1(x1)

        x2 = self.dcn2(x[2])
        x2 = self.upconv2(x2)

        x3 = self.dcn3(x[3])
        x3 = self.upconv3(x3)

        return [x0,x1,x2,x3]

MFS

image

​ MFS应用MeanPooling操作获得不同尺度的基础设施特征的表示,而不同尺度的车辆特征首先通过mean操作融合,然后通过MeanPooling进行细化。为了寻找不同尺度下车辆特征和基础设施特征之间的相关性,交叉注意应用于基础设施表示作为关键,车辆表示作为查询,生成每个尺度m的注意权重ω m inf。我们计算特征^fM inf和权重ω m inf之间的乘积。MCA的最终输出是增强的基础设施图像特征finf和车辆图像特征fveh。

def attention(query, key, mask=None, dropout=None):

    # from IPython import embed
    # embed()

    "Compute 'Scaled Dot Product Attention'"
    d_k = query.size(-1)
    scores = torch.matmul(query, key.transpose(-2, -1)) \
            / math.sqrt(d_k)
    if mask is not None:
        scores = scores.masked_fill(mask == 0, -1e9)
    p_attn = F.softmax(scores, dim = -1)
    if dropout is not None:
        p_attn = dropout(p_attn)
    return p_attn

def extract_img_feat(self, img, img_metas):
    """Extract features from images."""
    bs = img.shape[0]
    img_v = img[:,0,...]
    img_i = img[:,1,...]

    x_v = self.backbone_v(img_v)
    x_v = self.neck_v(x_v)
    x_v = self.dcn_up_conv_v(list(x_v))
    x_v_tensor = torch.stack(x_v).permute(1,0,2,3,4)
    x_v_out = torch.mean(x_v_tensor,dim=1)

    x_i = self.backbone_i(img_i)
    x_i = self.neck_i(x_i)
    # from IPython import embed
    # embed(header='compress')

    # Add compression encoder-decoder
    x_i = self.inf_compressor(x_i)

    x_i = self.dcn_up_conv_i(list(x_i))
    x_i_tensor = torch.stack(x_i).permute(1,0,2,3,4)

    # query.shape[B,C]
    # key.shape[B,N_levels,C]
    query = torch.mean(x_v_out,dim=(-2,-1))[:,None,:]
    key = torch.mean(x_i_tensor,dim=(-2,-1))
    weights_i = attention(query,key).squeeze(1)

    # print('attention_weights',weights_i)

    x_i_out = (weights_i[:,:,None,None,None] * x_i_tensor).sum(dim=1)

    return tuple((x_v_out, x_i_out))

CCM

image

​ CCM将学习一个通道掩码来衡量通道之间的重要性。由于不同的通道表示不同距离的目标信息,这些信息与相机参数密切相关,因此将相机参数作为先验来增强图像特征是直观的。首先,将摄像机的内、外特性拉伸成一维并进行连接。然后,使用MLP将它们放大到特征的维数C,以生成通道掩模Mveh/inf。最后,Mveh/inf用于在通道方向上重新加权图像特征fveh/inf,并获得结果f’veh/inf。

class CCMNet(nn.Module):
    def __init__(self, in_channels, mid_channels, context_channels, reduction_ratio=1):
        super(CCMNet, self).__init__()
        self.reduce_conv = nn.Sequential(
            nn.Conv2d(in_channels,
                      mid_channels,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.BatchNorm2d(mid_channels),
            nn.ReLU(inplace=True),
        )
        self.context_conv = nn.Conv2d(mid_channels,
                                      context_channels,
                                      kernel_size=1,
                                      stride=1,
                                      padding=0)
        self.bn = nn.BatchNorm1d(16)
        self.context_mlp = Mlp(16, mid_channels, mid_channels)
        self.context_se = SE_Inception_Layer(mid_channels,reduction_ratio=reduction_ratio)  # NOTE: add camera-aware

        # self.context_se = CASELayer(mid_channels,reduction_ratio=8)  # NOTE: add camera-aware
    
    def ida_mat_cal(self,img_meta):
        img_scale_factor = (img_meta['scale_factor'][:2]
                if 'scale_factor' in img_meta.keys() else 1)

        img_shape = img_meta['img_shape'][:2]
        orig_h, orig_w = img_shape

        ida_rot = torch.eye(2)
        ida_tran = torch.zeros(2)

        ida_rot *= img_scale_factor
        # ida_tran -= torch.Tensor(crop[:2])
        if 'flip' in img_meta.keys() and img_meta['flip']:
            A = torch.Tensor([[-1, 0], [0, 1]])
            b = torch.Tensor([orig_w, 0])
            ida_rot = A.matmul(ida_rot)
            ida_tran = A.matmul(ida_tran) + b

        ida_mat = ida_rot.new_zeros(4, 4)
        ida_mat[3, 3] = 1
        ida_mat[2, 2] = 1
        ida_mat[:2, :2] = ida_rot
        ida_mat[:2, 3] = ida_tran

        return ida_mat

    def forward(self, x_v, x_i, img_metas):
        # x [bs,num_cams,C,H,W]
        bs, C, H, W = x_v.shape
        num_cams = 2

        x = torch.stack((x_v,x_i),dim=1).reshape(-1, C, H, W)

        extrinsic_v_list = list()
        extrinsic_i_list = list()
        intrinsic_v_list = list()
        intrinsic_i_list = list()
        for img_meta in img_metas:

            extrinsic_v = torch.Tensor(img_meta['lidar2img']['extrinsic'][0])
            extrinsic_i = torch.Tensor(img_meta['lidar2img']['extrinsic'][1])
            intrinsic_v = torch.Tensor(img_meta['lidar2img']['intrinsic'][0])
            intrinsic_i = torch.Tensor(img_meta['lidar2img']['intrinsic'][1])
            # from IPython import embed
            # embed(header='ida')
            ida_mat = self.ida_mat_cal(img_meta)

            intrinsic_v = ida_mat @ intrinsic_v
            intrinsic_i = ida_mat @ intrinsic_i

            extrinsic_v_list.append(extrinsic_v)
            extrinsic_i_list.append(extrinsic_i)
            intrinsic_v_list.append(intrinsic_v)
            intrinsic_i_list.append(intrinsic_i)

        extrinsic_v = torch.stack(extrinsic_v_list)
        extrinsic_i = torch.stack(extrinsic_i_list)
        intrinsic_v = torch.stack(intrinsic_v_list)
        intrinsic_i = torch.stack(intrinsic_i_list)

        extrinsic = torch.stack((extrinsic_v,extrinsic_i),dim=1) 
        intrinsic = torch.stack((intrinsic_v,intrinsic_i),dim=1) 

        in_mlp = torch.stack(
                    (
                        intrinsic[..., 0, 0],
                        intrinsic[..., 1, 1],
                        intrinsic[..., 0, 2],
                        intrinsic[ ..., 1, 2],
                    ),
                    dim=-1
                )

        # from IPython import embed
        # embed(header='DCMNet')
        ex_mlp = extrinsic[...,:3,:].view(bs,num_cams,-1)
        mlp_input = torch.cat((in_mlp,ex_mlp),dim=-1)
        mlp_input = mlp_input.reshape(-1,mlp_input.shape[-1]).to(x.device)

        mlp_input = self.bn(mlp_input)
        x = self.reduce_conv(x)
        # context_se = self.context_mlp(mlp_input)[..., None, None]
        context_se = self.context_mlp(mlp_input)
        context = self.context_se(x, context_se)
        context = self.context_conv(context)

        context = context.reshape(bs,num_cams,C,H,W)
        x_v_out = context[:,0,...]
        x_i_out = context[:,1,...]

        # from IPython import embed
        # embed(header='DCMNet end')
        return tuple((x_v_out, x_i_out))

标签:Essay,2024.04,nn,img,05,self,torch,channels,intrinsic
From: https://www.cnblogs.com/ggyt/p/18115915

相关文章

  • P9058 [Ynoi2004] rpmtdq 题解
    Description给定一棵有边权的无根树,需要回答一些询问。定义\(\texttt{dist(i,j)}\)代表树上点\(i\)和点\(j\)之间的距离。对于每一组询问,会给出\(l,r\),你需要输出\(\min(\texttt{dist(i,j)})\)其中\(l\leqi<j\leqr\)。\(n\leq2\times10^5\),\(q\leq10^6\),\(1\l......
  • 20240405比赛总结
    寄的很惨T1[JLOI2014]聪明的燕姿https://gxyzoj.com/d/hzoj/p/3672敲个警钟,千万不要用一些奇怪的方法写自己会的题,不然大概率会一分不剩由小学奥数知识,约数和的求法为\(\prod(1+p_i^2+p_i^3+\dots+p_i^{a_i})\)所以,可以先线性预处理出约数和,再直接统计,时间复杂度\(O(nk)\)......
  • P3052 [USACO12MAR] Cows in a Skyscraper G
    原题链接题解模拟,遍历n个物品,一开始一个箱子不给,遍历到某个物品时,先把所有已经给了的箱子放进去试试,再创一个新箱子放进去试试code#include<bits/stdc++.h>usingnamespacestd;intn,w;intcnt,ans;intchongdie=0;intbox[20],c[20];voidmoni(intnow,intcnt)//now......
  • L1-056 猜数字
    一群人坐在一起,每人猜一个100以内的数,谁的数字最接近大家平均数的一半就赢。本题就要求你找出其中的赢家。输入格式:输入在第一行给出一个正整数N(≤104)。随后N行,每行给出一个玩家的名字(由不超过8个英文字母组成的字符串)和其猜的正整数(≤ 100)。输出格式:在一行中顺序输出......
  • STM32F1系列硬件I2C移植MPU6050DMP库
    目录前言STM32CubeMX配置过程工程移植与使用前言配置好硬件I2C拿过来直接用就行,一点也不用改这段时间在移植正点原子的MPU6050的库函数,网络上的教程大部分都是标准库同时也是软件模拟I2C的形式,这里我把正点原子的函数移植成了HAL库的硬件I2C的,下面会附带工程源码STM3......
  • CHC5054Web应用程序开发
    Web应用程序开发:课件分配这门课程相当于CHC5054模块100%的分数。您还需要以下模块的技能:●CHC4008(Python编程)●CHC4007(设计报告)●CHC5049(数据库设计)●CHC5226(安全实施)规格您的任务是开发和测试一个简单的基于网络的学习管理系统的完整堆栈,该系统旨在促进教育课程的管理、交付和跟......
  • POI2005 KOS-Dicing
    网络流#二分#POI#Year2005考虑二分答案,用\(Dinic\)来\(check\)具体来说,就是对每一个人限制流量,然后检查能不能把所有场全部流满#include<bits/stdc++.h>usingnamespacestd;#defineintlonglong#defineullunsignedlonglong#defineALL(a)(a).begin(),(a).......
  • 2024.04.04 网站初步搭建完成
        今天,我终于把自己耗时一年左右的时间搭建的一个网站终于初步完成了,这个网站就是咸蛋Online,这个从后端到前端都是自己一步一步摸索出来的,对于一个完全不懂前端的人来讲,过程可谓坎坷,借此,把这个过程记录下来,也和大家分享下。自己的文采不是很好,有很多想写但是写不出来的,大......
  • "NU1605: 错误形式的警告: 检测到包降级"的解决办法
    这两行的意思是需要我们升级Maui.Controls的版本在8.0.14,取高版本。同理,再次进行:最后:......
  • 2024.04 别急记录
    1.餐巾计划问题建图跑费用流即可:\((S,1,\inf,p)\);\(\foralli\in[1,N],(i,i+N,r_i,0)\);\(\foralli\in[1,N],(S,i+N,r_i,0)\);\(\foralli\in[1,N],(i,T,r_i,0)\);\(\foralli\in[1,N),(i,i+1,\inf,0)\);\(\foralli\in[1,N-n],(i+N,i+n,\inf,f)\);\......