CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
Thursday, June 19, 2025
No Result
View All Result
CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
No Result
View All Result
CRYPTOREPORTCLUB

Vision-language model creates plans for automated inspection of environments

June 19, 2025
158
0

June 19, 2025 feature

The GIST Vision-language model creates plans for automated inspection of environments

Related Post

Researchers are teaching AI to see more like humans

Researchers are teaching AI to see more like humans

June 19, 2025
New test can help driverless cars make ‘moral’ decisions

New test can help driverless cars make ‘moral’ decisions

June 19, 2025
Ingrid Fadelli

contributing writer

Gaby Clark

scientific editor

Robert Egan

associate editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

A vision-language model that creates plans for the automated inspection of environments
Figure showing the pipeline of the team's method. The input to their method includes a text description and a 3D environmental map, and the output consists of smooth trajectories that conform to the user's text description, which include targets, orders, and spatial relationships. Credit: Sun et al.

Recent advances in the field of robotics have enabled the automation of various real-world tasks, ranging from the manufacturing or packaging of goods in many industry settings to the precise execution of minimally invasive surgical procedures. Robots could also be helpful for inspecting infrastructure and environments that are hazardous or difficult for humans to access, such as tunnels, dams, pipelines, railways and power plants.

Despite their promise for the safe assessment of real-world environments, currently, most inspections are still carried out by human agents. In recent years, some computer scientists have been trying to develop computational models that can effectively plan the trajectories that robots should follow when inspecting specific environments and ensure that they execute actions that will allow them to complete desired missions.

Researchers at Purdue University and LightSpeed Studios recently introduced a new training-free computational technique for generating inspection plans based on written descriptions, which could guide the movements of robots as they inspect specific environments. Their proposed approach, outlined in a paper published on the arXiv preprint server, specifically relies on vision-language models (VLMs), which can process both images and written texts.

"Our paper was inspired by real-world challenges in automated inspection, where generating task-specific inspection routes efficiently is critical for applications like infrastructure monitoring," Xingpeng Sun, first author of the paper, told Tech Xplore.

"While most existing approaches use Vision-Language Models (VLMs) for exploring unknown environments, we take a novel direction by leveraging VLMs to navigate known 3D scenes for fine-grained robot inspection planning tasks using natural language instructions."

The key objective of this recent study by Sun and his colleagues was to develop a computational model that would enable the streamlined generation of inspection plans tailored around specific needs or missions. In addition, they wanted this model to work well without requiring further fine-tuning VLMs on large amounts of data, as most other machine learning-based generative models do.

A vision-language model that creates plans for the automated inspection of environments
Outputs of our method, where the inspection trajectories are drawn in red. Robot agent viewpoint camera frames of selected POIs are attached on the left side to highlight text conformity, with the corresponding orientations marked along the trajectory. More visual comparison with previous methods are shown in the supplemental video. Credit: arXiv (2025). DOI: 10.48550/arxiv.2506.02917

"We propose a training-free pipeline that uses a pre-trained VLM (e.g., GPT-4o) to interpret inspection targets described in natural language along with relevant images," explained Sun.

"The model evaluates candidate viewpoints based on semantic alignment, and we further leverage GPT-4o to reason about relative spatial relationships (e.g., inside/outside the target) using multi-view imagery. An optimized 3D inspection trajectory is then generated by solving a Traveling Salesman Problem (TSP) using Mix Integer Programming that accounts for semantic relevance, spatial order, and location constraints."

The TSP is a classical optimization problem that aims to identify the shortest possible route connecting multiple locations on a map, while also considering constraints and characteristics of an environment. After solving this problem, their model refines smooth trajectories for the robot performing an inspection and optimal camera viewpoints for capturing sites of interest.

"Our novel training-free VLM-based approach for robot inspection planning efficiently translates natural language queries into smooth, accurate 3D inspection planning trajectories for robots," said Sun and his advisor Dr. Aniket Bera. "Our findings also reveal that state-of-the-art VLMs, such as GPT-4o, exhibit strong spatial reasoning capabilities when interpreting multi-view images."

Sun and his colleagues evaluated their proposed inspection plan generation model in a series of tests, where they asked it to create plans for inspecting various real-world environments, feeding it images of those environments. Their findings were very promising, as the model successfully outlined smooth trajectories and optimal camera-view points for completing the desired inspections, predicting spatial relations with an accuracy of over 90%.

As part of their future studies, the researchers plan to develop and test their approach further to enhance its performance across a wide range of environments and scenarios. The model could then be assessed using real robotic systems and eventually deployed in real-world settings.

"Our next steps include extending the method to more complex 3D scenes, integrating active visual feedback to refine plans on the fly, and combining the pipeline with robot control to enable closed‑loop physical inspection deployment," added Sun and Bera.

Written for you by our author Ingrid Fadelli, edited by Gaby Clark , and fact-checked and reviewed by Robert Egan —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You'll get an ad-free account as a thank-you.

More information: Xingpeng Sun et al, Text-guided Generation of Efficient Personalized Inspection Plans, arXiv (2025). DOI: 10.48550/arxiv.2506.02917

Journal information: arXiv

© 2025 Science X Network

Citation: Vision-language model creates plans for automated inspection of environments (2025, June 19) retrieved 19 June 2025 from https://techxplore.com/news/2025-06-vision-language-automated-environments.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Vision-language models gain spatial reasoning skills through artificial worlds and 3D scene descriptions 21 shares

Feedback to editors

Share212Tweet133ShareShare27ShareSend

Related Posts

Researchers are teaching AI to see more like humans
AI

Researchers are teaching AI to see more like humans

June 19, 2025
0

June 19, 2025 The GIST Researchers are teaching AI to see more like humans Sadie Harley scientific editor Robert Egan associate editor Editors' notes This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility: fact-checked trusted...

Read moreDetails
New test can help driverless cars make ‘moral’ decisions

New test can help driverless cars make ‘moral’ decisions

June 19, 2025
Justice at stake as generative AI enters the courtroom

Justice at stake as generative AI enters the courtroom

June 19, 2025
Some AI prompts could cause 50 times more CO₂ emissions than others, researchers find

Some AI prompts could cause 50 times more CO₂ emissions than others, researchers find

June 19, 2025
Seeing through a new LENS allows brain-like navigation in robots

Seeing through a new LENS allows brain-like navigation in robots

June 18, 2025
Psycholinguist talks nonsense to ChatGPT to understand how it processes language

Psycholinguist talks nonsense to ChatGPT to understand how it processes language

June 18, 2025
AI paves the way toward green cement

AI paves the way toward green cement

June 18, 2025

Recent News

The Fairphone 6 leaks ahead of its rumored late June launch

The Fairphone 6 leaks ahead of its rumored late June launch

June 19, 2025

How Will Bitcoin Defend Against Quantum Computing? This Project Just Raised $6M

June 19, 2025
Researchers are teaching AI to see more like humans

Researchers are teaching AI to see more like humans

June 19, 2025
Netflix signs deal to host live TV channels in France

Netflix signs deal to host live TV channels in France

June 19, 2025

TOP News

  • Shiba Inu Price Prediction Today

    614 shares
    Share 246 Tweet 154
  • North Korean Hackers Pose as South Korean Government Officials to Steal Crypto

    594 shares
    Share 238 Tweet 149
  • Meta plans stand-alone AI app

    555 shares
    Share 222 Tweet 139
  • Kia’s EV4, its first electrical sedan, will probably be out there within the US later this 12 months

    560 shares
    Share 224 Tweet 140
  • New Pokémon Legends: Z-A trailer reveals a completely large model of Lumiose Metropolis

    560 shares
    Share 224 Tweet 140
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Use
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Crypto news
  • AI
  • Technologies

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved