首页 > 其他分享 >一个AI模型统治所有机器人

一个AI模型统治所有机器人

时间:2024-10-31 17:35:47浏览次数:6  
标签:AI 模型 机器人 robots CrossFormer robotic model data

原文链接:Robotic Control Module: One AI Model for Any Robot - IEEE Spectrum

A new model can operate virtually any robot design, including arms, quadrupeds, and drones

一个新模型几乎可以操作任何机器人设计,包括机械臂、四足机器人和无人机

 

UC Berkeley/Carnegie Mellon University

加州大学伯克利分校/卡内基梅隆大学

 

The software used to control a robot is normally highly adapted to its specific physical set up. But now researchers have created a single general-purpose robotic control policy that can operate robotic arms, wheeled robots, quadrupeds, and even drones.

通常,用于控制机器人的软件需要高度适应其特定的物理设置。但现在,研究人员已经创建了一个通用的机器人控制策略,该策略可以操作机械臂、轮式机器人、四足机器人,甚至无人机。

 

One of the biggest challenges when it comes to applying machine learning to robotics is the paucity of data. While computer vision and natural language processing can piggyback off the vast quantities of image and text data found on the Internet, collecting robot data is costly and time-consuming.

将机器学习应用于机器人技术时,面临的最大挑战之一是数据的匮乏。虽然计算机视觉和自然语言处理可以依赖互联网上大量的图像和文本数据,但收集机器人数据既昂贵又耗时。

To get around this, there have been growing efforts to pool data collected by different groups on different kinds of robots, including the Open X-Embodiment and DROID datasets. The hope is that training on diverse robotics data will lead to “positive transfer,” which refers to when skills learned from training on one task help to boost performance on another.

为了解决这个问题,人们越来越努力地将不同组在不同类型机器人上收集的数据进行汇总,包括Open X-Embodiment和DROID数据集。人们希望通过对多样化的机器人数据进行训练,能够实现“正向迁移”,即在一个任务上学习到的技能有助于提升在另一个任务上的性能。

 

The problem is that robots often have very different embodiments—a term used to describe their physical layout and suite of sensors and actuators—so the data they collect can vary significantly. For instance, a robotic arm might be static, have a complex arrangement of joints and fingers, and collect video from a camera on its wrist. In contrast, a quadruped robot is regularly on the move and relies on force feedback from its legs to maneuver. The kinds of tasks and actions these machines are trained to carry out are also diverse: The arm may pick and place objects, while the quadruped needs keen navigation.

 

问题在于,机器人往往具有非常不同的物理形态——这个词用于描述它们的物理布局以及传感器和执行器的组合——因此它们收集的数据可能会有很大差异。例如,机械臂可能是静态的,具有复杂的关节和手指排列,并从手腕上的摄像头收集视频。相比之下,四足机器人则经常处于移动状态,并依赖腿部的力反馈进行操控。这些机器被训练执行的任务和动作也各不相同:机械臂可能用于抓取和放置物体,而四足机器人则需要敏锐的导航能力。

 

That makes training a single AI model on these large collections of data challenging, says Homer Walke, a Ph.D. student at the University of California, Berkeley. So far, most attempts have either focused on data from a narrower selection of similar robots or researchers have manually tweaked data to make observations from different robots more similar. But in research to be presented at the Conference on Robot Learning (CoRL) in Munich in November, they unveiled a new model called CrossFormer that can train on data from a diverse set of robots and control them just as well as specialized control policies.

 

加州大学伯克利分校的博士生荷马·沃克表示,这使得在这些庞大的数据集上训练一个单一的AI模型变得具有挑战性。到目前为止,大多数尝试要么集中在来自更窄范围相似机器人的数据上,要么研究人员手动调整数据以使来自不同机器人的观测结果更加相似。但在11月慕尼黑机器人学习大会(CoRL)上公布的研究中,他们推出了一种名为CrossFormer的新模型,该模型可以在来自不同机器人的数据集上进行训练,并且控制效果与专门的控制策略一样好。

 

How to control diverse robots with the same AI model

 

如何使用相同的AI模型控制不同的机器人

 

The team used the same model architecture that powers large language model, known as a transformer. In many ways, the challenge the researchers were trying to solve is not dissimilar to that facing a chatbot, says Walke. In language modeling, the AI has to to pick out similar patterns in sentences with different lengths and word orders. Robot data can also be arranged in a sequence much like a written sentence, but depending on the particular embodiment, observations and actions vary in length and order too.

 

该团队使用了与大型语言模型(称为转换器)相同的模型架构。沃克说,从很多方面来看,研究人员试图解决的挑战与聊天机器人面临的挑战并无不同。在语言建模中,人工智能必须挑选出不同长度和词序的句子中的相似模式。机器人数据也可以像书面句子一样按顺序排列,但根据具体实现方式的不同,观测值和动作的长度和顺序也会有所不同。

“Words might appear in different locations in a sentence, but they still mean the same thing,” says Walke. “In our task, an observation image might appear in different locations in the sequence, but it’s still fundamentally an image and we still want to treat it like an image.”

沃克说:“单词可能出现在句子中的不同位置,但它们的意思仍然相同。在我们的任务中,观测图像可能出现在序列中的不同位置,但它本质上仍然是一张图像,我们仍然希望像处理图像一样处理它。”

 

Most machine learning approaches work through a sequence one element at a time, but transformers can process the entire stream of data at once. This allows them to analyze the relationship between different elements and makes them better at handling sequences that are not standardized, much like the diverse data found in large robotics datasets.

 

大多数机器学习方法都是通过一次处理序列中的一个元素来工作的,但转换器可以一次性处理整个数据流。这使得它们能够分析不同元素之间的关系,并且更擅长处理非标准化的序列,就像大型机器人数据集中发现的多样化数据一样。

Walke and his colleagues aren’t the first to train transformers on large-scale robotics data. But previous approaches have either trained solely on data from robotic arms with broadly similar embodiments or manually converted input data to a common format to make it easier to process. In contrast, CrossFormer can process images from cameras positioned above a robot, at head height or on a robotic arms wrist, as well as joint position data from both quadrupeds and robotic arms, without any tweaks.

沃克和他的同事并不是第一批在大规模机器人数据上训练转换器的人。但以前的方法要么仅对大体上相似的机械臂的数据进行训练,要么手动将输入数据转换为通用格式以使其更易于处理。相比之下,CrossFormer可以处理来自机器人上方、头部高度或机械臂手腕的摄像头拍摄的图像,以及来自四足动物和机械臂的关节位置数据,而无需进行任何调整。

 

The result is a single control policy that can operate single robotic arms, pairs of robotic arms, quadrupeds, and wheeled robots on tasks as varied as picking and placing objects, cutting sushi, and obstacle avoidance. Crucially, it matched the performance of specialized models tailored for each robot and outperformed previous approaches trained on diverse robotic data. The team even tested whether the model could control an embodiment not included in the dataset—a small quadcopter. While they simplified things by making the drone fly at a fixed altitude, CrossFormer still outperformed the previous best method.

结果是一个单一的控制策略,可以操作单个机械臂、成对的机械臂、四足动物和轮式机器人,执行从抓取和放置物体、切割寿司到避开障碍物等各种任务。至关重要的是,它的性能与为每个机器人量身定制的专用模型相匹配,并且优于之前在各种机器人数据上训练的方法。研究团队甚至测试了该模型是否可以控制数据集中未包含的一种形态——一架小型四轴飞行器。虽然他们通过让无人机在固定高度飞行来简化了操作,但CrossFormer的表现仍然优于之前最好的方法。

 

“That was definitely pretty cool,” says Ria Doshi, an undergraduate student at Berkeley. “I think that as we scale up our policy to be able to train on even larger sets of diverse data, it’ll become easier to see this kind of zero shot transfer onto robots that have been completely unseen in the training.”

伯克利大学本科生里亚·多西说:“这绝对非常酷。我认为,随着我们扩大政策规模,能够在更大的多样化数据集上进行训练,将更容易看到这种零样本迁移到在训练中完全未见过的机器人上。”

  

The limitations of one AI model for all robots

一个AI模型用于所有机器人的局限性

The team admits there’s still work to do, however. The model is too big for any of the robots’ embedded chips and instead has to be run from a server. Even then, processing times are only just fast enough to support real-time operation, and Walke admits that could break down if they scale up the model. “When you pack so much data into a model it has to be very big and that means running it for real-time control becomes difficult.”

然而,研究团队承认仍有工作要做。该模型对于任何机器人的嵌入式芯片来说都太大了,因此必须从服务器运行。即便如此,处理时间也只是勉强能够支持实时操作,沃克承认,如果扩大模型规模,可能会崩溃。“当你把这么多数据放入模型中时,它必须非常大,这意味着进行实时控制变得困难。”

 

One potential workaround would be to use an approach called distillation, says Oier Mees, a postdoctoral research at Berkley and part of the CrossFormer team. This essentially involves training a smaller model to mimic the larger model, and if successful can result in similar performance for a much smaller computational budget.

伯克利大学博士后研究员、CrossFormer团队成员之一奥耶·米斯表示,一个潜在的解决方案是使用一种称为蒸馏的方法。这基本上涉及训练一个较小的模型来模仿较大的模型,如果成功,可以在更小的计算成本下获得相似的性能。

 

But of more importance than the computing resource problem is that the team failed to see any positive transfer in their experiments, as CrossFormer simply matched previous performance rather than exceeding it. Walke thinks progress in computer vision and natural language processing suggests that training on more data could be the key.

但比计算资源问题更重要的是,研究团队在他们的实验中没有看到任何积极的迁移,因为CrossFormer只是与之前的性能相匹配,而没有超过它。沃克认为,计算机视觉和自然语言处理方面的进展表明,在更多数据上进行训练可能是关键。

 

Others say it might not be that simple. Jeannette Bohg, a professor of robotics at Stanford University, says the ability to train on such a diverse dataset is a significant contribution. But she wonders whether part of the reason why the researchers didn’t see positive transfer is their insistence on not aligning the input data. Previous research that trained on robots with similar observation and action data has shown evidence of such cross-overs. “By getting rid of this alignment, they may have also gotten rid of this significant positive transfer that we’ve seen in other work,” Bohg says.

其他人认为这可能没那么简单。斯坦福大学机器人学教授珍妮特·博格说,能够在如此多样化的数据集上进行训练是一个重要的贡献。但她想知道,研究人员没有看到积极迁移的部分原因是否是他们坚持不对输入数据进行对齐。之前的研究在具有相似观测和动作数据的机器人上进行了训练,并显示了这种跨界的证据。“通过消除这种对齐,他们可能也消除了我们在其他工作中看到的这种显著的积极迁移,”博格说。

 

It’s also not clear if the approach will boost performance on tasks specific to particular embodiments or robotic applications, says Ram Ramamoorthy, a robotics professor at Edinburgh University. The work is a promising step towards helping robots capture concepts common to most robots, like “avoid this obstacle,” he says. But it may be less useful for tackling control problems specific to a particular robot, such as how to knead dough or navigate a forest, which are often the hardest to solve.

爱丁堡大学机器人学教授拉姆·拉马穆蒂表示,目前尚不清楚这种方法是否会提高针对特定形态或机器人应用的任务的性能。他说,这项工作是帮助机器人捕捉大多数机器人共有的概念(如“避开这个障碍物”)的一个有前途的步骤。但对于解决特定于某个机器人的控制问题,如如何揉面团或在森林中导航等,这种方法可能用处不大,而这些问题往往是最难解决的。

 

 

产品名称

京东店铺

智能佳Mobile ALOHA2 机械臂 完整套装 斯坦福ALOHA 深度学习 家政服务ROS开源实验平台 高端复合机器人 ALOHA 2机械臂

https://item.jd.com/10097978503518.html

智能佳机械臂 Mobile ALOHA 斯坦福机械臂 完整复刻版 复合机器人 远程操控机械臂ROS开源学习实验平台 Mobile ALOHA 机械臂

https://item.jd.com/10100493559285.html

 

智能佳机器人

400 099 1872

www.bjrobot.com

 

京东店铺:智能佳机器人专营店 - 京东 (jd.com)

淘宝店铺:首页-智能佳机器人-淘宝网 (taobao.com)

企业淘宝:首页-智能佳机器人官方店铺-淘宝网 (taobao.com)

 

 

 

 

 

标签:AI,模型,机器人,robots,CrossFormer,robotic,model,data
From: https://www.cnblogs.com/bjrobot/p/18518480

相关文章

  • BERT模型分析
       在2018年Google提出Transformer框架后,2019年,BERT作为最早期的大模型,便应运而生,因为BERT有强大的自然语言理解能力,因此在其被提出后便风靡NLP领域。研读BERT代码,是因为BERT作为大模型起源鼻祖,比GPT起源还早,弄明白其算法思想和其主体代码具体实现逻辑,有利于理解现行流......
  • 开启AI外挂,三款谁用都事半功倍的智能AI工具
    在这个数字化时代,AI以前所未有的方式改变了我们的生活和工作模式。为了在职场中保持竞争力,掌握利用AI工具已经成为一项必不可少的技能。今天介绍三款能够显著提升工作效率的AI工具,帮助大家在职场中事半功倍,轻松工作,顺利摸鱼。文思助手:助力高效写作对于那些需要频繁撰写文案......
  • AI翻译工具:企业文档翻译的高效引擎
    在全球化趋势日益增强的今天,语言障碍往往成为企业扩展国际市场的一道门槛。招募更多的语言人员,常常会占用太多的人力成本,这时候人们就需要一些智能工具的帮助。苏哒智能翻译一体机应运而生。这款工具凭借其卓越的性能与精准度,正逐步成为企业文档翻译中的高效引擎,解决了多个实际......
  • 性能测试业务模型中常用的性能指标说明
    企业在实践过程中不断深入和积累,逐渐对部分性能指标的标准进行了补充,特别是在结合业务场景的分析过程中。一、系统处理能力本文主要是对RPS(RequeslPerSecond,每秒请求数)进行补充。我们知道TPS主要从事务数的角度来进行统计,而RPS主要从请求数的角度来进行统计。其中一个事务......
  • 就业市场变革:AI时代,我们将如何评估人才?
    内容概要在这个充满变革的时代,就业市场正被人工智能(AI)技术深刻改变。随着技术的进步,传统的人才评估方式逐渐显示出其局限性。例如,过去依赖于纸质简历和面试评估的方式在快速变化的环境中难以准确识别真实的人才潜力。在AI的帮助下,企业能够更高效地进行人才评估。借助数据分析......
  • 网络编程-OSI模型
    OSI模型OSI是opensysteminterconnection的缩写,译为“开放式系统互联”。OSI模型把网络通信的工作分为7层,从下到上分别是物理层,数据链路层,网络层,传输层,会话层,表示层和应用层。OSI七层网路模型和TCP/IP四层网络模型的对比。网络模型:就是进行数据封装的。当另一台计算机......
  • AI智能分析视频分析网关区域人数不足检测算法:智能监控的新篇章
    在当今社会快速发展的背景下,公共场所如购物中心、交通枢纽、教育机构等地的人群聚集现象越来越普遍。如何高效地管理和控制这些区域的人流,保障安全的同时提升服务水平,成为一个迫切需要解决的挑战。传统的人流统计方法,例如人工计数或基础的传感器技术,常常因效率低和准确度不足而受......
  • 构建第一个ArkTS应用(Stage模型)
    copy官网的留个记号:https://developer.huawei.com/consumer/cn/doc/harmonyos-guides-V5/start-with-ets-stage-V5创建ArkTS工程若首次打开DevEcoStudio,请点击CreateProject创建工程。如果已经打开了一个工程,请在菜单栏选择File>New>CreateProject来创建一个新工程。选......
  • 智慧园区算法视频分析服务器区域入侵算法:开源免费的目标检测模型及关键特性
    在人工智能和计算机视觉领域,目标检测技术已成为理解和分析视频内容的关键。随着深度学习技术的不断进步,一系列优秀的开源目标检测模型应运而生,它们在提高检测精度和效率方面发挥着重要作用。这些模型不仅推动了学术界的发展,也为工业界提供了强大的工具。以下是一些在开源社区中广......
  • python实战(五)——构建自己的大模型助手
    一、任务目标    本文将利用大语言模型强大的对话能力,搭建一个PC端问答助手。具体来说,我们将使用API来调用我们想要的大模型,并结合Prompt让大模型根据任务类型生成对应的输出。为了更方便地调用大模型助手,我们将结合python第三方库中的语音识别库进行开发,实现调用麦克......