UI-TARS GUI agent mannequin can automate duties similar to discovering and reserving airline tickets

January 23, 2025 report

The GIST Editors' notes

This text has been reviewed in keeping with Science X's editorial course of and insurance policies. Editors have highlighted the next attributes whereas making certain the content material's credibility:

fact-checked

preprint

trusted supply

proofread

UI-TARS GUI agent mannequin can automate duties similar to discovering and reserving airline tickets

GUI Agent model UI-TARS can automate tasks such as finding and booking airline tickets
Overview of UI-TARS, illustrating the structure of the mannequin and its core capabilities. Credit score: arXiv (2025). DOI: 10.48550/arxiv.2501.12326

A group of software program engineers, AI specialists and programmers at Tsinghua College, working with TikTok guardian firm ByteDance, has introduced the event of a graphical person interface (GUI) agent mannequin known as UI-TARS. The group introduced its improvement and introduction to the world at massive in a paper posted to the arXiv preprint server.

Over the previous decade, AI functions have flourished. A few of the most well-known are LLMs similar to ChatGPT. However others have been underneath improvement to serve quite a lot of functions. One utility is aiding laptop customers in finishing up mundane duties, similar to sourcing the most cost effective airline fare for a flight between two cities after which shopping for tickets for it. Such duties sometimes contain time-consuming net searching.

AI researchers have prompt that such duties could possibly be automated by sensible brokers. On this new research, the group in China has accomplished simply that with the event of UI-TARS—a GUI agent mannequin that can be utilized regionally on a private laptop or by way of the cloud on different gadgets.

The mannequin was skilled utilizing 50 billion tokens that represented traits of a GUI (by way of screenshots), similar to these discovered on conventional net pages. Coaching additionally concerned reflection tuning, which meant the mannequin was programmed to study from errors after which to adapt, modifying the way it approached totally different or unknown conditions.

When working UI-TARS, a person is introduced with two tabs—one reveals the "considering course of" that the app is present process because it goes about its total activity. The opposite tab reveals the web sites, information or different GUIs that the app is working with. Thus, if it was used to e-book a flight, a person might see the airline web sites being seen and will then swap over to see what the app was doing with them.

On the finish of the method, the person is introduced with the ultimate net web page prompting affirmation of ticket buy. In testing their mannequin, the group discovered that it outperformed different AI fashions similar to GPT-4o, or Gemini-2.0.

Extra data: Yujia Qin et al, UI-TARS: Pioneering Automated GUI Interplay with Native Brokers, arXiv (2025). DOI: 10.48550/arxiv.2501.12326

UI-TARS: github.com/bytedance/UI-TARS

Journal data: arXiv

© 2025 Science X Community

Quotation: UI-TARS GUI agent mannequin can automate duties similar to discovering and reserving airline tickets (2025, January 23) retrieved 24 January 2025 from https://techxplore.com/information/2025-01-gui-agent-automate-tasks-airline.html This doc is topic to copyright. Other than any truthful dealing for the aim of personal research or analysis, no half could also be reproduced with out the written permission. The content material is offered for data functions solely.

Discover additional

Adobe proclaims improvement of SLM that may run regionally on a telephone with no cloud connection 13 shares

Feedback to editors