January 23, 2025 report
The GIST Editors' notes
This text has been reviewed in keeping with Science X's editorial course of and insurance policies. Editors have highlighted the next attributes whereas making certain the content material's credibility:
fact-checked
preprint
trusted supply
proofread
UI-TARS GUI agent mannequin can automate duties similar to discovering and reserving airline tickets

A group of software program engineers, AI specialists and programmers at Tsinghua College, working with TikTok guardian firm ByteDance, has introduced the event of a graphical person interface (GUI) agent mannequin known as UI-TARS. The group introduced its improvement and introduction to the world at massive in a paper posted to the arXiv preprint server.
Over the previous decade, AI functions have flourished. A few of the most well-known are LLMs similar to ChatGPT. However others have been underneath improvement to serve quite a lot of functions. One utility is aiding laptop customers in finishing up mundane duties, similar to sourcing the most cost effective airline fare for a flight between two cities after which shopping for tickets for it. Such duties sometimes contain time-consuming net searching.
AI researchers have prompt that such duties could possibly be automated by sensible brokers. On this new research, the group in China has accomplished simply that with the event of UI-TARS—a GUI agent mannequin that can be utilized regionally on a private laptop or by way of the cloud on different gadgets.
The mannequin was skilled utilizing 50 billion tokens that represented traits of a GUI (by way of screenshots), similar to these discovered on conventional net pages. Coaching additionally concerned reflection tuning, which meant the mannequin was programmed to study from errors after which to adapt, modifying the way it approached totally different or unknown conditions.
When working UI-TARS, a person is introduced with two tabs—one reveals the "considering course of" that the app is present process because it goes about its total activity. The opposite tab reveals the web sites, information or different GUIs that the app is working with. Thus, if it was used to e-book a flight, a person might see the airline web sites being seen and will then swap over to see what the app was doing with them.
On the finish of the method, the person is introduced with the ultimate net web page prompting affirmation of ticket buy. In testing their mannequin, the group discovered that it outperformed different AI fashions similar to GPT-4o, or Gemini-2.0.
Extra data: Yujia Qin et al, UI-TARS: Pioneering Automated GUI Interplay with Native Brokers, arXiv (2025). DOI: 10.48550/arxiv.2501.12326
UI-TARS: github.com/bytedance/UI-TARS
Journal data: arXiv
© 2025 Science X Community
Quotation: UI-TARS GUI agent mannequin can automate duties similar to discovering and reserving airline tickets (2025, January 23) retrieved 24 January 2025 from https://techxplore.com/information/2025-01-gui-agent-automate-tasks-airline.html This doc is topic to copyright. Other than any truthful dealing for the aim of personal research or analysis, no half could also be reproduced with out the written permission. The content material is offered for data functions solely.
Discover additional
Adobe proclaims improvement of SLM that may run regionally on a telephone with no cloud connection 13 shares
Feedback to editors
