首页 > 其他分享 >Rethinking Point Cloud Registration as Masking and Reconstruction论文阅读

Rethinking Point Cloud Registration as Masking and Reconstruction论文阅读

时间:2023-10-08 16:44:18浏览次数:47  
标签:mathbb Rethinking point MRT Point times widetilde Masking cloud

Rethinking Point Cloud Registration as Masking and Reconstruction

2023 ICCV

*Guangyan Chen, Meiling Wang, Li Yuan, Yi Yang, Yufeng Yue*; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 17717-17727

image-20231006212155211

这论文标题就很吸引人,但是研读下来作者只是想用MAE的结构,想要预测出对齐后点云,然后提高跨点云间配准点的特征描述一致性,辅助特征提取网络训练

Abstract

文章核心立题: the invisible parts of each point cloud can serve as inherent masks, whereas the aligned point cloud pair can be treated as the reconstruction objective .

  • 将点云配准视为masking and Reconstruction过程,以Point-MAE为基本思想,提出MRA(the Masked Reconstrction Auxiliary Network)。
  • MRA可以很容易的嵌入到其他方法中further improve registration accuracy
  • 基于MRA,提出一个novel、基于standard transformer-baesed method,MRT(the Masked Reconstruction Transformer)。

encode feauters -> inference the contextual features and overall structures of point cloud pairs -> the deviation correction modul to correct the spatial deviations in the putative corresponding point pairs

Description

  • input:
    • source point cloud \(X = \{x_1, x_2, …,x_M\} \subseteq \mathbb{R}^3\)
    • target point cloud \(Y = \{y_1, y_2, …, y_N\} \subseteq \mathbb{R}^3\)
  • output: the rigid transformation \(\{\hat{R} \in SO(3), \hat{t} \in \mathbb{R}^3\}\) that align the source point cloud with the target point cloud.

(MRT是用来提特征的,应该也是dense description,MRA是用来辅助训练MRT的。)

image-20231007092510507

  1. MRT step: input point cloud pair \(X\) , \(Y\) 利用KPConv进行dense description,得到superpoints \([\widetilde{X}:F^{\widetilde{X}}]\) , \([\widetilde{Y}:F^{\widetilde{Y}}]\) 。然后其中的特征描述 \(F\) 经过Transformer Encoder Module提取contextual information and overall structure 重构每个特征描述 \(F^{\widetilde{X}}\) , \(F^{\widetilde{Y}}\) 。
  2. auxiliary network step:two module is parallel used to training MRT
    1. MRA: the MRA separately receives the encoded features of each point cloud and predicts the other aligned point cloud, reconstructing the complete point cloud.
    2. Registration network: predict point corrrespondences \(\hat{y},\ \hat{x}\) and overlap scores \(\hat{o}^{\widetilde{X}},\ \hat{o}^{\widetilde{Y}}\) in the Deviation Correction module, then use Wighted Procrustes module to regress the transformaion.

MRT Step

  1. KPConv network:
    1. input downsampled point clouds: \(X \in \mathbb{M \times 3}\), \(Y \in \mathbb{N \times 3}\)
    2. obtain superpoints and features: \([\widetilde{X} \in \mathbb{M^{'} \times 3}:F^{\widetilde{X}}\in \mathbb{R}^{M^{'} \times D}]\) , \([\widetilde{Y} \in \mathbb{M^{'} \times 3}:F^{\widetilde{Y}}\in \mathbb{R}^{N^{'} \times D}]\)
  2. Transformer Encoder:
    1. input superpoints and features into \(L_e\) - layer transformer encoder( cross-attention and sinusoidal positional encodings )
    2. output the encoded features \(\mathcal{F}^{\widetilde{X}}\) , \(\mathcal{F}^{\widetilde{Y}}\) .

cross-attention有助于两个point cloud提取一致性特征。

MRA Step

image-20231008152235919

一个纯MAE style的网络结构, mask token 代表对齐后相应的point cloud patch表示。输入对齐前的point cloud patch,相应的token,根据GT rigid transformation信息生成的position embedding,和mask token,输出预测的对齐后point cloud patch,再与GT 生成的对齐结果做chamfer Loss。

虽然表面上这里有很多与变换相关的操作,但是细细思考会发现这里所有的变换信息都建立在GT上,所以我倾向于这里与MRT里的cross-attention一起提高了配准点对在特征上的表示一致性,当然肯定对特征表示的语义完整性有提高。

  1. input -> MRT outputs: super points pair and corresponding features: \([\widetilde{X} \in \mathbb{M^{'} \times 3}:\mathcal{F}^{\widetilde{X}}\in \mathbb{R}^{M^{'} \times D}]\) , \([\widetilde{Y} \in \mathbb{M^{'} \times 3}:\mathcal{F}^{\widetilde{Y}}\in \mathbb{R}^{N^{'} \times D}]\) 。
  2. output: chamfer L2 loss between predicted aligned point cloud patch and ground truth aligned point cloud.

步骤:

  1. use FPS to extract center points \(\widetilde{X}_c\) , \(\widetilde{Y}_c\) in super points. use KNN to generate point cloud patch; get the tokens \(T^{\widetilde{X}}\) , \(T^{\widetilde{Y}}\) by composing the encoded features \(\mathcal{F}^{\widetilde{X}}\) , \(\mathcal{F}^{\widetilde{Y}}\) .use mask token \(T^{\widetilde{X}}_m ∈ \mathbb{R}^{g×D}\), \(T^{\widetilde{Y}}_m ∈ \mathbb{R}^{g×D}\) to correspond the aligned point cloud patch in the output of decoder.
  2. use groud truth transformation from \(Y\) to \(X\), and from \(X\) to \(Y\) to generate the position embedding for each layer in decoder.

image-20231008154227499

  1. self-attention and two-layer-FC transfromer decoder to reconstruct the mask token to represent the token of aligned point cloud patch.
  2. use two-layer MLP with two FC and ReLu to predict the aligned point cloud patch responding to the decoded mask tokens.
  3. chamfer loss: the ground truch aligned point cloud patch and the predicted one.

image-20231008154743263

coarse registration step

image-20231008160206420

由于MRT提取的特征强聚合(cross-attention的缘故)了跨点云间的语义信息,根据余弦相似性计算soft corresponding wighted,加权求和得到correspodence point pair,在拼接特征以及对应点对的坐标用MLP拟合加权求和得到的点对坐标与真实位置的偏差。构筑更鲁棒的匹配结果。(这种预测bias的方式经常见)。之后使用weighted procustes模块预测rigid transformaion。

我更想倾向于这样描述:单纯加权求和得到的坐标结果大概率与真实坐标有所偏差,引入另一个可变分量来对加权后的预测结果做调控,能够使得预测结果更加鲁棒,更加稳定,甚至能更加精确,从而在现象上,显示为偏差值。并且这里的余弦相似性从一定程度上可以提高非配准点之前的差异性。

  1. input: the feature \(\mathcal{F}^{\widetilde{X}}\) , \(\mathcal{F}^{\widetilde{Y}}\) extracted by MRT
  2. output: predicted rigid transformation: \([\hat{R}; \hat{t}]\)

步骤:

  1. predicted the corresponding points \(\mathcal{Y}\) for each super point \(\widetilde{X}\) :

image-20231008160656224

  1. use features and MLP predict the deviations which needs to add to the predicted corresponding points:

image-20231008160921645

  1. predict the overlap scores for each point. which indicatee probabilities of ponts lying in the overlap regions:

image-20231008161047649

  1. use the wighted procrustes to predict the rigid transformation and compute the loss with GT.

Experiment

image-20231008162712340

image-20231008162725258

MRA的plug-and play,确实可以:

image-20231008162909364

标签:mathbb,Rethinking,point,MRT,Point,times,widetilde,Masking,cloud
From: https://www.cnblogs.com/name555difficult/p/17749565.html

相关文章

  • Error while loading conda entry point: conda-libmamba-solver (libarchive.so.19:
    本人使用centos:7.6.1810及Miniconda3-py311_23.5.2-0-Linux-x86_64默认状态下应该没有这个问题。当在使用conda下载包时,如果不小心更新了涉及conda-libmamba-solver和libarchive的包,就可能会导致这个报错消息出现。Errorwhileloadingcondaentrypoint:conda-libmamb......
  • 论文阅读:Semi-supervised point cloud segmentation using self-training with label
    Semi-supervisedpointcloudsegmentationusingself-trainingwithlabelconfidencepredictionLi等人(2021b)基于伪标签置信度预测的半监督分割方法,额外设计判别网络(discriminatornetwork),该网络目标是区分预测结果和真实标注,并对无标注点云的预测结果输出置信度预测,对判别网络......
  • E. Power of Points
    E.PowerofPoints题意很简单:从左到右取点,输出该点到每个点的距离之和思路:1.对一个有序的序列进行计算,我们发现从左往右,左边点数的距离会增加,右边点数的距离会减小2.因此我们只需暴力的计算第一个点到所有点的距离之和,接下来的点只需一步就可计算出来2.1ans+=左边的点数之......
  • 论文解读:HybridCR: weakly-supervised 3D point cloud semantic segmentation via hybr
    HybridCR:weakly-supervised3Dpointcloudsemanticsegmentationviahybridcontrastiveregularization基于混合对比学习正则化约束的增强方法,Li等人(2022a)使用极少标注(0.03%)在室内点云数据集上获得的分割精度为全监督方法的78.3%。是第一个利用点一致性并以端到端方式采用......
  • 什么是 Stable Diffusion 中的 Masking
    StableDiffusion是一种深度学习技术,主要用于生成式对抗网络(GANs)的训练。这一技术旨在提高生成图像和视频的质量和稳定性。StableDiffusion引入了一种称为"masking"的功能,用于改进训练的效果。在本文中,我将详细介绍StableDiffusion中masking的具体含义,并通过示例来说明......
  • 创新功能先导:Copilot in SharePoint Online
    博客链接:https://blog.51cto.com/u_13637423SharePoint是世界上最灵活的内容平台,无论客户还是合作伙伴都可以以SharePoint为载体,使用SharePointFramework、MicrosoftGraph和PowerPlatform构建Portal、文档系统、项目管理、各种OA流程管理等解决方案。随着AI技术的普及和推广,微软......
  • 论文解读:CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Poin
    CrossPoint:Self-SupervisedCross-ModalContrastiveLearningfor3DPointCloudUnderstanding本文提出一种简单的跨模态3维—2维区域对应模块,分别将点云模态和图像模态提取的特征向量重新投影到一个公共的特征空间中,并基于最大化与模态无关的互信息的思想设计对比学习损失......
  • 论文解读:PointCLIP: Point Cloud Understanding by CLIP
    PointCLIP:PointCloudUnderstandingbyCLIPcvpr2022最近,通过对比视觉语言预训练(CLIP)进行的零样本和少样本学习在2D视觉识别方面显示出了鼓舞人心的性能,该识别学习在开放词汇设置中将图像与其相应的文本进行匹配。然而,由大规模2D图像文本对预训练的CLIP是否可以推广......
  • 20 广域网技术PPP(Point to Point)协议/实验+理论
    广域网广域网是连接不同地区局域网的网络,通常所覆盖的范围从几十公里到几千公里。它能连接多个地区、城市和国家,或横跨几个洲提供远距离通信,形成国际性的远程网络。广域网和局域网的区别局域网是一种覆盖地理区域比较小的计算机网络。广域网是一种通过租用ISP网络或者自建......
  • Fallible point in C/C++
    Operator[]Theperformanceof[]inCandC++isdifferent.e.g.,whenyouexcuteA[index]IfAisaobject,itwillcalltheoperator[]IfAisapointer,itisequivalenttoA+indexSo,operatoroverloadingisinvalidtopointer.......