Lift, Splat, Shoot_ Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

时间：2024-01-15 16:57:05浏览次数：38

标签：Shoot Encoding 相机 Camera LSS our 3D 坐标系 view

zotero-key: HP5VFNPQ
zt-attachments:
  - "413"
title: "Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D"
citekey: philionLiftSplatShoot2020

Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

Zotero

Abstract

The goal of perception for autonomous vehicles is to extract semantic representations from multiple sensors and fuse these representations into a single "bird's-eye-view" coordinate frame for consumption by motion planning. We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras. The core idea behind our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a rasterized bird's-eye-view grid. By training on the entire camera rig, we provide evidence that our model is able to learn not only how to represent images but how to fuse predictions from all cameras into a single cohesive representation of the scene while being robust to calibration error. On standard bird's-eye-view tasks such as object segmentation and map segmentation, our model outperforms all baselines and prior work. In pursuit of the goal of learning dense representations for motion planning, we show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network. We benchmark our approach against models that use oracle depth from lidar. Project page with code: https://nv-tlabs.github.io/lift-splat-shoot .

Comments

LSS: 通过隐式非投影到3D的方式，对任意配置的camera图像（不同数量相机）进行encoding
输入多视角的环视图像，输出BEV坐标下的语义信息，车辆、可通行区域、车道线等

LSS一般被称作是BEV的开山之作，后续一系列的工作都是基于LSS做的

Q&A

0.背景动机

传统的一般的计算机视觉任务，比如目标检测、语义分割等，输入一张图片，在和输入图片相同的坐标系中做出预测，这和自动驾驶真正需要的感知任务不符合。在自动驾驶中，不同坐标系下的传感器数据会作为输入，感知的下游规划模块需要的是ego坐标下的感知结果：

有许多简单实用的策略可以将single-image的范式扩展到multi-view的。
对于从n个camera中进行3D目标检测这个问题来说，可以在各个视角的图片上使用一个相同的single-image检测器做2D检测，然后根据相机内外参将检测结果投影到ego坐标系下。
这种扩展方式有三个有价值的对称性：

平移不变性：图片中的像素坐标系发生偏移，输出结果也会对应偏移相同数量
排列不变性：n个相机不同的排列对最终的输出结果没有影响
ego坐标系等距等差性：无论捕获图像的相机相对于ego位于何处，都将在给定图像中检测到相同的物体。也就是说如果ego坐标系发生平移旋转，输出结果也会对应平移旋转。
这种简易的处理方式缺点也很明显：
无法使用data-driven的方法找到最合适用于融合跨相机的信息的方法。也没有办法使用反向传播来利用下游planner的反馈来自动优化感知系统。

提出了一个名为“Lift-Splat”的模型，它保留了上面设计中确定的 3 个对称性，同时也是端到端可微分的，是data-driven的。

LSS这种从多视图相机中学习内在表示的方法是建立在当时最新的传感器融合和单目目标检测的基础上的。nuscense,lyft,waymo和argo等360环视相机数据集的建立也功不可没。

Monocular Object Detection

单目目标检测器是根据它们如何对从图像平面到给定 3D 坐标系的变换进行建模来定义的。
一种标准的技术路线是二阶段的：首先在图像上应用一个成熟的2D目标检测网络，然后训练第二个网络将2D框回归为3D框（SSD-6D、Monogrnet等）
另外一种思路是

1.LSS分别是什么含义

2.LSS对图像overlap有要求吗？对于overlap很小或者没有overlap的情况，有特别处理吗？

3.对标定误差容忍是怎样实现的

4.后续论文

Pipeline

Performance

标签：Shoot,Encoding,相机,Camera,LSS,our,3D,坐标系,view
From： https://www.cnblogs.com/swc-blog/p/17965737

使用cv2.getOptimalNewCameraMatrix函数，变为圆形是出现什么错误
cv2.getOptimalNewCameraMatrix函数用于计算一个新的相机矩阵，以进行图像畸变校正。这个函数的目标是通过考虑畸变的影响，生成一个新的相机矩阵，使得校正后的图像更接近理想的情况。cv2.getOptimalNewCameraMatrix(cameraMatrix,distCoeffs,imageSize,alpha,newImgSize)其中......
Pod Init Error: force_encoding': can't modify frozen String (FrozenError)
热烈欢迎，请直接点击！！！进入博主AppStore主页，下载使用各个作品！！！注：博主将坚持每月上线一个新app！！如下图所示，切换Xcode为Xcode13。 ......
Failed to load resource: net::ERR_INCOMPLETE_CHUNKED_ENCODING。
前端间隔性报错：后端接口异常浏览器审查，内容如下：前端报错：Failedtoloadresource:net::ERR_INCOMPLETE_CHUNKED_ENCODING。后端报错：Causedby:java.io.IOException:Brokenpipeatsun.nio.ch.FileDispatcherImpl.write0(NativeMethod)atsun.nio.ch.SocketDi......
Halcon 相机外部参数标定例程一（camera_calibration_external.hdev）
1.create_calib_data—CreateaHALCONcalibrationdatamodel 创建一个HALCON校准数据模型2.read_cam_par—Readinternalcameraparametersfromafile 从文件中读取相机内部参数 3.set_calib_data_cam_param—Settypeandinitialparametersofa......
camera raw16.1最新版本更新包
photoshop的强悍之处不单单是软件的功能强大，也是因为各种插件的辅助，为此小编今日带来了cameraraw最新版本，该产品推出了全新的16.1版本，在功能和体验上更进一步，采用强大的图像处理技术，可以导入增强数字负片(DNG)格式的原始文件及编辑RAW文件，并提供最新型号相机和镜头配置文件支持，能......
12月Camera Raw16.1更新，支持win＋mac，镜头模糊功能增强
AdobeCameraRaw插件在12月迎来了史诗级更新，当前最新版本为CameraRaw16.1版本。这次更新不仅功能更强大、性能更稳定，而且融入了人工智能黑科技！实现了一键调色、一键虚化的新效果，修图越来越简单智能了。云盘下载链接：https://pan.xunlei.com/s/VNmUAFhB9F841QV0S2aR5d_5A1?pwd=3y......
sql server pre-login troubleshooting
wireshark抓包之后，首先过滤数据库服务器的IPip.src==172.22.58.4orip.dst==172.22.58.4找到第一条TCP握手记录之后，右键选中，FollowTCPstream然后会自动标记筛选出，从握手到断开的所有packet数据包tcp.streameq56 UsingSQLServer’sSNITracetoTroubleshootNetwor......
[Troubleshooting] kubectl cp exit code 255 - exec: \"tar\": executable file no
0.背景kubectlcpcontainer文件到本地host报错：$kubectlcptest/po-test-pod-0:/tmp./-cctr-test-containertime="2023-12-20T02:17:29Z"level=errormsg="execfailed:unabletostartcontainerprocess:exec:\"tar\":executablefile......
OpenHarmony南向之Camera简述
OpenHarmony南向之Camera简述Camera驱动框架该驱动框架模型内部分为三层，依次为HDI实现层、框架层和设备适配层：HDI实现层：实现OHOS（OpenHarmonyOperationSystem）相机标准南向接口。框架层：对接HDI实现层的控制、流的转发，实现数据通路的搭建，管理相机各个硬件设备等功能。设备适配层......
Rethinking and Improving Relative Position Encoding for Vision Transformer: ViT
RethinkingandImprovingRelativePositionEncodingforVisionTransformer*Authors:[[KanWu]],[[HouwenPeng]],[[MinghaoChen]],[[JianlongFu]],[[HongyangChao]]初读印象comment::(iRPE)提出了专门用于图像的相对位置编码方法，code:Cream/iRPEatmain·mi......