Overview of Embodied AI (Part1)
课程主页:https://ai-workshops.github.io/building-and-working-in-environments-for-embodied-ai-cvpr-2022/
slides
video
讲师:
Zhiwei Jia
课程内容
Simulators
使用仿真引擎主要考虑如下因素:
- Rendering:RGB/Depth/Optical Flow/Segmentation等Sensor数据
- Physics:不同类别的task需要不同级别的物理仿真,例如,视觉导航使用Partial Physics即可,而开门、倒水等Task需要Full Physics。
- Speed:渲染速度
- Objects types and properties:刚体、液体、可切分、可碎裂 等属性
- Action modeling:分为High Level(将某物放到某个地方)与Low Level(倒水)
- Human interface:该仿真器与人交互的方式,例如 键鼠 或者 VR
Assets
- object:
-- Grasping:抓各种静态物品,数据集 YCB\EGAD
-- General Manipulation Skill:倒水、开门、关抽屉 等各种操作,数据集 PartNet-Mobility、DoorGym、Objects from iTHOR、Meta-World
-- Multisensory:混合多模态信息源(视觉、声音、触觉等),数据集 ObjectFolder、ThreeDWorld - scene: 静态场景(例如家里的布局)
- demonstrations:state-based trajectories,例如下面Slides
Task
- Locomotion: eg. Control a robot dog to perform a series of actions
- Visual Navigation:
-- Object Goal Navigation: Specify an object category and ask the agent to find it
-- Embodied Question Answering: Ask an agent to answer a question which requires it to navigate in the scene - Object Manipulation: 开抽屉、抹桌子、洗杯子然后冲咖啡
- Rearrangement: bring poses of the objects to a specified configuration