Researchers develop AI movement ‘translation’ mannequin for controlling completely different sorts of robots

Might 8, 2025

The GIST Editors' notes

This text has been reviewed based on Science X's editorial course of and insurance policies. Editors have highlighted the next attributes whereas guaranteeing the content material's credibility:

fact-checked

preprint

trusted supply

proofread

Researchers develop AI movement 'translation' mannequin for controlling completely different sorts of robots

Researchers develop AI motion 'translation' model for controlling different kinds of robots
MotionGlot is a mannequin that may generate movement trajectories that obey person directions throughout a number of embodiments with completely different motion dimensions, similar to (a) quadruped robots, and (b) people. The figures (a,b) depict the qualitative benchmark of MotionGlot towards the tailored templates (A.T) of [1] on the text-to-robot movement (Part IV-A.1), Q&A with human movement (Part IV-C) duties respectively. The general quantitative efficiency throughout duties is proven in (c). In (a,b), growing opacity signifies ahead time. Credit score: arXiv (2024). DOI: 10.48550/arxiv.2410.16623

Brown College researchers have developed a man-made intelligence mannequin that may generate motion in robots and animated figures in a lot the identical method that AI fashions like ChatGPT generate textual content.

A paper describing this work is printed on the arXiv preprint server.

The mannequin, known as MotionGlot, permits customers to easily sort an motion—"stroll ahead a couple of steps and take a proper"— and the mannequin can generate correct representations of that movement to command a robotic or animated avatar.

The mannequin's key advance, based on the researchers, is its means to "translate" movement throughout robotic and determine varieties, from humanoids to quadrupeds and past. That permits the era of movement for a variety of robotic embodiments and in every kind of spatial configurations and contexts.

"We're treating movement as merely one other language," mentioned Sudarshan Harithas, a Ph.D. pupil in pc science at Brown, who led the work. "And simply as we are able to translate languages—from English to Chinese language, for instance—we are able to now translate language-based instructions to corresponding actions throughout a number of embodiments. That permits a broad set of recent purposes."

The analysis will likely be offered later this month on the 2025 Worldwide Convention on Robotics and Automation in Atlanta. The work was co-authored by Harithas and his advisor, Srinath Sridhar, an assistant professor of pc science at Brown.

Massive language fashions like ChatGPT generate textual content by way of a course of known as "subsequent token prediction," which breaks language down right into a sequence of tokens, or small chunks, like particular person phrases or characters. Given a single token or a string of tokens, the language mannequin makes a prediction about what the following token is likely to be.

These fashions have been extremely profitable in producing textual content, and researchers have begun utilizing related approaches for movement. The concept is to interrupt down the parts of movement—the discrete place of legs through the strategy of strolling, for instance—into tokens. As soon as the movement is tokenized, fluid actions could be generated by way of subsequent token prediction.

One problem with this strategy is that motions for one physique sort can look very completely different for an additional. For instance, when an individual is strolling a canine down the road, the individual and the canine are each doing one thing known as "strolling," however their precise motions are very completely different. One is upright on two legs; the opposite is on all fours.

In keeping with Harithas, MotionGlot can translate the which means of strolling from one embodiment to a different. So a person commanding a determine to "stroll ahead in a straight line" will get the right movement output whether or not they occur to be commanding a humanoid determine or a robotic canine.

To coach their mannequin, the researchers used two datasets, every containing hours of annotated movement knowledge. QUAD-LOCO options dog-like quadruped robots performing quite a lot of actions together with wealthy textual content describing these actions. An identical dataset known as QUES-CAP accommodates actual human motion, together with detailed captions and annotations applicable to every motion.

Utilizing that coaching knowledge, the mannequin reliably generates applicable actions from textual content prompts, even actions it has by no means particularly seen earlier than. In testing, the mannequin was in a position to recreate particular directions, like "a robotic walks backwards, turns left and walks ahead," in addition to extra summary prompts like "a robotic walks fortunately."

It might even use movement to reply questions. When requested, "Are you able to present me motion in cardio exercise?" the mannequin generates an individual jogging.

"These fashions work greatest once they're skilled on heaps and plenty of knowledge," Sridhar mentioned. "If we may gather large-scale knowledge, the mannequin could be simply scaled up."

The mannequin's present performance and the adaptability throughout embodiments make for promising purposes in human-robot collaboration, gaming and digital actuality, and digital animation and video manufacturing, the researchers say. They plan to make the mannequin and its supply code publicly accessible so different researchers can use it and increase on it.

Extra info: Sudarshan Harithas et al, MotionGlot: A Multi-Embodied Movement Technology Mannequin, arXiv (2024). DOI: 10.48550/arxiv.2410.16623

Journal info: arXiv Offered by Brown College Quotation: Researchers develop AI movement 'translation' mannequin for controlling completely different sorts of robots (2025, Might 8) retrieved 8 Might 2025 from https://techxplore.com/information/2025-05-ai-motion-kinds-robots.html This doc is topic to copyright. Aside from any honest dealing for the aim of personal examine or analysis, no half could also be reproduced with out the written permission. The content material is offered for info functions solely.

Discover additional

Dynamic mannequin can generate practical human motions and edit present ones shares

Feedback to editors