Might 8, 2025
The GIST Editors' notes
This text has been reviewed based on Science X's editorial course of and insurance policies. Editors have highlighted the next attributes whereas guaranteeing the content material's credibility:
fact-checked
preprint
trusted supply
proofread
Researchers develop AI movement 'translation' mannequin for controlling completely different sorts of robots
![MotionGlot is a model that can generate motion trajectories that obey user instructions across multiple embodiments with different action dimensions, such as (a) quadruped robots, and (b) humans. The figures (a,b) depict the qualitative benchmark of MotionGlot against the adapted templates (A.T) of [1] on the text-to-robot motion (Section IV-A.1), Q&A with human motion (Section IV-C) tasks respectively. The overall quantitative performance across tasks is shown in (c). In (a,b), increasing opacity indicates forward time. Credit: arXiv (2024). DOI: 10.48550/arxiv.2410.16623 Researchers develop AI motion 'translation' model for controlling different kinds of robots](https://scx1.b-cdn.net/csz/news/800a/2025/researchers-develop-ai-7.jpg)
Brown College researchers have developed a man-made intelligence mannequin that may generate motion in robots and animated figures in a lot the identical method that AI fashions like ChatGPT generate textual content.
A paper describing this work is printed on the arXiv preprint server.
The mannequin, known as MotionGlot, permits customers to easily sort an motion—"stroll ahead a couple of steps and take a proper"— and the mannequin can generate correct representations of that movement to command a robotic or animated avatar.
The mannequin's key advance, based on the researchers, is its means to "translate" movement throughout robotic and determine varieties, from humanoids to quadrupeds and past. That permits the era of movement for a variety of robotic embodiments and in every kind of spatial configurations and contexts.
"We're treating movement as merely one other language," mentioned Sudarshan Harithas, a Ph.D. pupil in pc science at Brown, who led the work. "And simply as we are able to translate languages—from English to Chinese language, for instance—we are able to now translate language-based instructions to corresponding actions throughout a number of embodiments. That permits a broad set of recent purposes."
The analysis will likely be offered later this month on the 2025 Worldwide Convention on Robotics and Automation in Atlanta. The work was co-authored by Harithas and his advisor, Srinath Sridhar, an assistant professor of pc science at Brown.
Massive language fashions like ChatGPT generate textual content by way of a course of known as "subsequent token prediction," which breaks language down right into a sequence of tokens, or small chunks, like particular person phrases or characters. Given a single token or a string of tokens, the language mannequin makes a prediction about what the following token is likely to be.
These fashions have been extremely profitable in producing textual content, and researchers have begun utilizing related approaches for movement. The concept is to interrupt down the parts of movement—the discrete place of legs through the strategy of strolling, for instance—into tokens. As soon as the movement is tokenized, fluid actions could be generated by way of subsequent token prediction.
One problem with this strategy is that motions for one physique sort can look very completely different for an additional. For instance, when an individual is strolling a canine down the road, the individual and the canine are each doing one thing known as "strolling," however their precise motions are very completely different. One is upright on two legs; the opposite is on all fours.
In keeping with Harithas, MotionGlot can translate the which means of strolling from one embodiment to a different. So a person commanding a determine to "stroll ahead in a straight line" will get the right movement output whether or not they occur to be commanding a humanoid determine or a robotic canine.
To coach their mannequin, the researchers used two datasets, every containing hours of annotated movement knowledge. QUAD-LOCO options dog-like quadruped robots performing quite a lot of actions together with wealthy textual content describing these actions. An identical dataset known as QUES-CAP accommodates actual human motion, together with detailed captions and annotations applicable to every motion.
Utilizing that coaching knowledge, the mannequin reliably generates applicable actions from textual content prompts, even actions it has by no means particularly seen earlier than. In testing, the mannequin was in a position to recreate particular directions, like "a robotic walks backwards, turns left and walks ahead," in addition to extra summary prompts like "a robotic walks fortunately."
It might even use movement to reply questions. When requested, "Are you able to present me motion in cardio exercise?" the mannequin generates an individual jogging.
"These fashions work greatest once they're skilled on heaps and plenty of knowledge," Sridhar mentioned. "If we may gather large-scale knowledge, the mannequin could be simply scaled up."
The mannequin's present performance and the adaptability throughout embodiments make for promising purposes in human-robot collaboration, gaming and digital actuality, and digital animation and video manufacturing, the researchers say. They plan to make the mannequin and its supply code publicly accessible so different researchers can use it and increase on it.
Extra info: Sudarshan Harithas et al, MotionGlot: A Multi-Embodied Movement Technology Mannequin, arXiv (2024). DOI: 10.48550/arxiv.2410.16623
Journal info: arXiv Offered by Brown College Quotation: Researchers develop AI movement 'translation' mannequin for controlling completely different sorts of robots (2025, Might 8) retrieved 8 Might 2025 from https://techxplore.com/information/2025-05-ai-motion-kinds-robots.html This doc is topic to copyright. Aside from any honest dealing for the aim of personal examine or analysis, no half could also be reproduced with out the written permission. The content material is offered for info functions solely.
Discover additional
Dynamic mannequin can generate practical human motions and edit present ones shares
Feedback to editors
