CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
Wednesday, October 1, 2025
No Result
View All Result
CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
No Result
View All Result
CRYPTOREPORTCLUB

AI generates data to help embodied agents ground language to 3D world

June 16, 2025
170
0

June 16, 2025

The GIST AI generates data to help embodied agents ground language to 3D world

Related Post

We teach young people to write. In the age of AI, we must teach them how to see

We teach young people to write. In the age of AI, we must teach them how to see

October 1, 2025
Generative AI might end up being worthless—and that could be a good thing

Generative AI might end up being worthless—and that could be a good thing

October 1, 2025
Sadie Harley

scientific editor

Robert Egan

associate editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

AI generates data to help embodied agents ground language to 3D world
A new 3D-text dataset, 3D-GRAND, leverages generative AI to create synthetic rooms that are automatically annotated with 3D structures. The dataset's 40,087 household scenes can help train embodied AI, like household robots, connect language to 3D spaces. Credit: Joyce Chai

A new, densely annotated 3D-text dataset called 3D-GRAND can help train embodied AI, like household robots, to connect language to 3D spaces. The study, led by University of Michigan researchers, was presented at the Computer Vision and Pattern Recognition (CVPR) Conference in Nashville, Tennessee on June 15, and published on the arXiv preprint server.

When put to the test against previous 3D datasets, the model trained on 3D-GRAND reached 38% grounding accuracy, surpassing the previous best model by 7.7%. 3D-GRAND also drastically reduced hallucinations to only 6.67% from the previous state-of-the-art rate of 48%.

The dataset contributes to the next generation of household robots that will far exceed the robotic vacuums that currently populate homes. Before we can command a robot to "pick up the book next to the lamp on the nightstand and bring it to me," the robot must be trained to understand what language refers to in space.

"Large multimodal language models are mostly trained on text with 2D images, but we live in a 3D world. If we want a robot to interact with us, it must understand spatial terms and perspectives, interpret object orientations in space, and ground language in the rich 3D environment," said Joyce Chai, a professor of computer science and engineering at U-M and senior author of the study.

While text or image-based AI models can pull an enormous amount of information from the internet, 3D data is scarce. It's even harder to find 3D data with grounded text data—meaning specific words like "sofa" are linked to 3D coordinates bounding the actual sofa.

Like all LLMs, 3D-LLMs perform best when trained on large data sets. However, building a large dataset by imaging rooms with cameras would be time-intensive and expensive as annotators must manually specify objects and their spatial relationships and link words to their corresponding objects.

The research team took a new approach, leveraging generative AI to create synthetic rooms that are automatically annotated with 3D structures. The resulting 3D-GRAND dataset includes 40,087 household scenes paired with 6.2 million densely-grounded descriptions of the room.

"A big advantage of synthetic data is that labels come for free because you already know where the sofa is, which makes the curation process easier," said Jianing Jed Yang, a doctoral student of computer science and engineering at U-M and lead author of the study.

After generating the synthetic 3D data, an AI pipeline first used vision models to describe each object's color, shape and material. From here, a text-only model generated descriptions of entire scenes while using scene graphs—structured maps of how objects relate to each other—to ensure each noun phrase is grounded to specific 3D objects.

A final quality control step used a hallucination filter to ensure each object generated in the text actually has an associated object in the 3D scene.

Human evaluators spot-checked 10,200 room-annotation pairs to ensure reliability by assessing whether there were any inaccuracies in AI-generated sentences or objects. The synthetic annotations had a low error rate of about 5% to 8%, which is comparable to professional human annotations.

"Given the size of the dataset, the LLM-based annotation reduces both the cost and time by an order of magnitude compared to human annotation, creating 6.2 million annotations in just two days. It is widely recognized that collecting high-quality data at scale is essential for building effective AI models," said Yang.

To put the new dataset to the test, the research team trained a model on 3D-GRAND and compared it with three baseline models (3D-LLM, LEO and 3D-VISTA). The benchmark ScanRefer evaluated grounding accuracy—how much overlap the predicted bounding box overlaps with the true object boundary—while a newly introduced benchmark called 3D-POPE evaluated object hallucinations.

The model trained on 3D-GRAND reached a 38% grounding accuracy with only a 6.67% hallucination rate, far exceeding the competing generative models. While 3D-GRAND contributes to the 3D-LLM modeling community, testing on robots will be the next step.

"It will be exciting to see how 3D-GRAND helps robots better understand space and take on different spatial perspectives, potentially improving how they communicate and collaborate with humans," said Chai.

More information: Jianing Yang et al, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination, arXiv (2024). DOI: 10.48550/arxiv.2406.05132

Journal information: arXiv Provided by University of Michigan College of Engineering Citation: AI generates data to help embodied agents ground language to 3D world (2025, June 16) retrieved 16 June 2025 from https://techxplore.com/news/2025-06-ai-generates-embodied-agents-ground.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Vision-language models gain spatial reasoning skills through artificial worlds and 3D scene descriptions 0 shares

Feedback to editors

Share234Tweet146ShareShare29ShareSend

Related Posts

We teach young people to write. In the age of AI, we must teach them how to see
AI

We teach young people to write. In the age of AI, we must teach them how to see

October 1, 2025
0

October 1, 2025 The GIST We teach young people to write. In the age of AI, we must teach them how to see Lisa Lock scientific editor Andrew Zinin lead editor Editors' notes This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the...

Read moreDetails
Generative AI might end up being worthless—and that could be a good thing

Generative AI might end up being worthless—and that could be a good thing

October 1, 2025
Amazon adds AI muscle to connected home lineup

Amazon adds AI muscle to connected home lineup

October 1, 2025
OpenAI’s Sora joins Meta in pushing AI-generated videos. Some are worried about a flood of ‘AI slop’

OpenAI’s Sora joins Meta in pushing AI-generated videos. Some are worried about a flood of ‘AI slop’

October 1, 2025
One Tech Tip: OpenAI adds parental controls to ChatGPT for teen safety

One Tech Tip: OpenAI adds parental controls to ChatGPT for teen safety

October 1, 2025
How safe is your face? The pros and cons of having facial recognition everywhere

How safe is your face? The pros and cons of having facial recognition everywhere

October 1, 2025
AI tool helps researchers treat child epilepsy

AI tool helps researchers treat child epilepsy

October 1, 2025

Recent News

Meta will soon use AI chats for ad targeting because of course it will

Meta will soon use AI chats for ad targeting because of course it will

October 1, 2025
We teach young people to write. In the age of AI, we must teach them how to see

We teach young people to write. In the age of AI, we must teach them how to see

October 1, 2025
Historic Bull Month “October” Arrives, But Bitcoin (BTC) Faces Two Major Tests! Analyst Firm Reveals Expectations!

Historic Bull Month “October” Arrives, But Bitcoin (BTC) Faces Two Major Tests! Analyst Firm Reveals Expectations!

October 1, 2025
Uber found not guilty in first of many sexual assault lawsuits

Uber found not guilty in first of many sexual assault lawsuits

October 1, 2025

TOP News

  • God help us, Donald Trump plans to sell a phone

    God help us, Donald Trump plans to sell a phone

    586 shares
    Share 234 Tweet 147
  • Investment Giant 21Shares Announces New Five Altcoins Including Avalanche (AVAX)!

    586 shares
    Share 234 Tweet 147
  • WhatsApp has ads now, but only in the Updates tab

    586 shares
    Share 234 Tweet 147
  • Tron Looks to go Public in the U.S., Form Strategy Like TRX Holding Firm: FT

    586 shares
    Share 234 Tweet 147
  • AI generates data to help embodied agents ground language to 3D world

    585 shares
    Share 234 Tweet 146
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Use
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Crypto news
  • AI
  • Technologies

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved