CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
Monday, June 30, 2025
No Result
View All Result
CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
No Result
View All Result
CRYPTOREPORTCLUB

A faster, better way to train general-purpose robots: New technique pools diverse data

October 28, 2024
152
0

October 28, 2024

Editors' notes

Related Post

Creating a 3D interactive digital room from simple video

Creating a 3D interactive digital room from simple video

June 30, 2025
Meta spending big on AI talent but will it pay off?

Meta spending big on AI talent but will it pay off?

June 30, 2025

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

A faster, better way to train general-purpose robots: New technique pools diverse data

A faster, better way to train general-purpose robots
Researchers filmed multiple instances of a robotic arm feeding co-author Jialiang Zhao's adorable dog, Momo. The videos were included in datasets to train the robot. Credit: Massachusetts Institute of Technology

In the classic cartoon "The Jetsons," Rosie the robotic maid seamlessly switches from vacuuming the house to cooking dinner to taking out the trash. But in real life, training a general-purpose robot remains a major challenge.

Typically, engineers collect data that are specific to a certain robot and task, which they use to train the robot in a controlled environment. However, gathering these data is costly and time-consuming, and the robot will likely struggle to adapt to environments or tasks it hasn't seen before.

To train better general-purpose robots, MIT researchers developed a versatile technique that combines a huge amount of heterogeneous data from many sources into one system that can teach any robot a wide range of tasks.

Their method involves aligning data from varied domains, like simulations and real robots, and multiple modalities, including vision sensors and robotic arm position encoders, into a shared "language" that a generative AI model can process.

The work is published on the arXiv preprint server.

By combining such an enormous amount of data, this approach can be used to train a robot to perform a variety of tasks without the need to start training it from scratch each time.

This method could be faster and less expensive than traditional techniques because it requires far fewer task-specific data. In addition, it outperformed training from scratch by more than 20% in simulation and real-world experiments.

"In robotics, people often claim that we don't have enough training data. But in my view, another big problem is that the data come from so many different domains, modalities, and robot hardware. Our work shows how you'd be able to train a robot with all of them put together," says Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead author of the paper on this technique.

Wang's co-authors include fellow EECS graduate student Jialiang Zhao; Xinlei Chen, a research scientist at Meta; and senior author Kaiming He, an associate professor in EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the Conference on Neural Information Processing Systems, held 10–15 December at the Vancouver Convention Center.

Inspired by LLMs

A robotic "policy" takes in sensor observations, like camera images or proprioceptive measurements that track the speed and position a robotic arm, and then tells a robot how and where to move.

Policies are typically trained using imitation learning, meaning a human demonstrates actions or teleoperates a robot to generate data, which are fed into an AI model that learns the policy. Because this method uses a small amount of task-specific data, robots often fail when their environment or task changes.

To develop a better approach, Wang and his collaborators drew inspiration from large language models like GPT-4.

These models are pretrained using an enormous amount of diverse language data and then fine-tuned by feeding them a small amount of task-specific data. Pretraining on so much data helps the models adapt to perform well on a variety of tasks.

"In the language domain, the data are all just sentences. In robotics, given all the heterogeneity in the data, if you want to pretrain in a similar manner, we need a different architecture," he says.

Robotic data take many forms, from camera images to language instructions to depth maps. At the same time, each robot is mechanically unique, with a different number and orientation of arms, grippers, and sensors. Plus, the environments where data are collected vary widely.

The MIT researchers developed a new architecture called Heterogeneous Pretrained Transformers (HPT) that unifies data from these varied modalities and domains.

They put a machine-learning model known as a transformer into the middle of their architecture, which processes vision and proprioception inputs. A transformer is the same type of model that forms the backbone of large language models.

The researchers align data from vision and proprioception into the same type of input, called a token, which the transformer can process. Each input is represented with the same fixed number of tokens.

Then the transformer maps all inputs into one shared space, growing into a huge, pretrained model as it processes and learns from more data. The larger the transformer becomes, the better it will perform.

A user only needs to feed HPT a small amount of data on their robot's design, setup, and the task they want it to perform. Then HPT transfers the knowledge the transformer gained during pretraining to learn the new task.

Enabling dexterous motions

One of the biggest challenges of developing HPT was building the massive dataset to pretrain the transformer, which included 52 datasets with more than 200,000 robot trajectories in four categories, including human demo videos and simulation.

The researchers also needed to develop an efficient way to turn raw proprioception signals from an array of sensors into data the transformer could handle.

"Proprioception is key to enable a lot of dexterous motions. Because the number of tokens is in our architecture always the same, we place the same importance on proprioception and vision," Wang explains.

When they tested HPT, it improved robot performance by more than 20% on simulation and real-world tasks, compared with training from scratch each time. Even when the task was very different from the pretraining data, HPT still improved performance.

"This paper provides a novel approach to training a single policy across multiple robot embodiments. This enables training across diverse datasets, enabling robot learning methods to significantly scale up the size of datasets that they can train on. It also allows the model to quickly adapt to new robot embodiments, which is important as new robot designs are continuously being produced," says David Held, associate professor at the Carnegie Mellon University Robotics Institute, who was not involved with this work.

In the future, the researchers want to study how data diversity could boost the performance of HPT. They also want to enhance HPT so it can process unlabeled data like GPT-4 and other large language models.

"Our dream is to have a universal robot brain that you could download and use for your robot without any training at all. While we are just in the early stages, we are going to keep pushing hard and hope scaling leads to a breakthrough in robotic policies, like it did with large language models," he says.

More information: Lirui Wang et al, Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers, arXiv (2024). DOI: 10.48550/arxiv.2409.20537

Journal information: arXiv Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: A faster, better way to train general-purpose robots: New technique pools diverse data (2024, October 28) retrieved 28 October 2024 from https://techxplore.com/news/2024-10-faster-general-purpose-robots-technique.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Multimodal and reasoning LLMs supersize training data for dexterous robotic tasks 0 shares

Feedback to editors

Share212Tweet133ShareShare27ShareSend

Related Posts

Creating a 3D interactive digital room from simple video
AI

Creating a 3D interactive digital room from simple video

June 30, 2025
0

June 30, 2025 The GIST Creating a 3D interactive digital room from simple video Gaby Clark scientific editor Robert Egan associate editor Editors' notes This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility: fact-checked trusted...

Read moreDetails
Meta spending big on AI talent but will it pay off?

Meta spending big on AI talent but will it pay off?

June 30, 2025
AI vision language models provide video descriptions for blind users

AI vision language models provide video descriptions for blind users

June 30, 2025
How AI is revolutionizing ATL’s international terminal

How AI is revolutionizing ATL’s international terminal

June 30, 2025
AI is learning to lie, scheme, and threaten its creators

AI is learning to lie, scheme, and threaten its creators

June 29, 2025
China’s humanoid robots generate more soccer excitement than their human counterparts

China’s humanoid robots generate more soccer excitement than their human counterparts

June 29, 2025
Hide and seek: Uncovering new ways to detect vault apps on smartphones

Hide and seek: Uncovering new ways to detect vault apps on smartphones

June 27, 2025

Recent News

A Super Mario Maker 2 player has cleared an astonishing 1 million levels

A Super Mario Maker 2 player has cleared an astonishing 1 million levels

June 30, 2025

Is Bitcoin (BTC) Currently Overpriced or Undervalued? Here’s What Analysts Think

June 30, 2025
NASA will start livestreaming content on Netflix later this summer

NASA will start livestreaming content on Netflix later this summer

June 30, 2025
Creating a 3D interactive digital room from simple video

Creating a 3D interactive digital room from simple video

June 30, 2025

TOP News

  • Apple details new fee structures for App Store payments in the EU

    Apple details new fee structures for App Store payments in the EU

    540 shares
    Share 216 Tweet 135
  • Buying Art from a Gallery. A Guide to Making the Right Choice

    534 shares
    Share 214 Tweet 134
  • Machine learning method for early fault detection could make lithium-ion batteries safer

    534 shares
    Share 214 Tweet 134
  • Bitcoin Bullishness For Q3 Grows: What Happens In Every Post-Halving Year?

    534 shares
    Share 214 Tweet 134
  • New Pokémon Legends: Z-A trailer reveals a completely large model of Lumiose Metropolis

    563 shares
    Share 225 Tweet 141
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Use
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Crypto news
  • AI
  • Technologies

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved