Faster, smarter, more open: Study shows new algorithms accelerate AI models

July 16, 2025

The GIST Faster, smarter, more open: Study shows new algorithms accelerate AI models

Urgent need for ‘global approach’ on AI regulation: UN tech chief

July 27, 2025

China urges global consensus on balancing AI development, security

July 26, 2025

Gaby Clark

scientific editor

Andrew Zinin

lead editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

llm — Credit: Unsplash/CC0 Public Domain

Just as people from different countries speak different languages, AI models also create various internal "languages"—a unique set of tokens understood only by each model. Until recently, there was no way for models developed by different companies to communicate directly, collaborate or combine their strengths to improve performance.

This week, at the International Conference on Machine Learning (ICML) in Vancouver, Canada, scientists from the Weizmann Institute of Science and Intel Labs are presenting a new set of algorithms that overcome this barrier, enabling users to benefit from combined computational power of AI models working together. The new algorithms, already available to millions of AI developers around the world, speed up the performance of large language models (LLMs)—today's leading models of generative AI—by 1.5 times, on average.

The research is published on the arXiv preprint server.

LLMs, such as ChatGPT and Gemini, are powerful tools, but they come with significant drawbacks: They are slow and consume large amounts of computing power. In 2022, major tech companies realized that AI models, like people, could benefit from collaboration and division of labor. This led to the development of a method called speculative decoding, in which a small, fast model, possessing relatively limited knowledge, makes a first guess while answering a user's query, and a larger, more powerful but slower model reviews and corrects the answer if needed.

Speculative decoding was quickly adopted by tech giants because it maintains 100% accuracy—unlike most acceleration techniques, which reduce output quality. But it had one big limitation: Both models had to "speak" the exact same digital language, which meant that models developed by different companies could not be combined.

"Tech giants adopted speculative decoding, benefiting from faster performance and saving billions of dollars a year in cost of processing power, but they were the only ones to have access to small, faster models that speak the same language as larger models," explains Nadav Timor, a Ph.D. student in Prof. David Harel's research team in Weizmann's Computer Science and Applied Mathematics Department, who led the new development.

"In contrast, a startup seeking to benefit from speculative decoding had to train its own small model that matched the language of the big one, and that takes a great deal of expertise and costly computational resources."

The new algorithms developed by Weizmann and Intel researchers allow developers to pair any small model with any large model, causing them to work as a team. To overcome the language barrier, the researchers came up with two solutions.

First, they designed an algorithm that allows an LLM to translate its output from its internal token language into a shared format that all models can understand. Second, they created another algorithm that prompts such models to mainly rely in their collaborative work on tokens that have the same meaning across models, similarly to words like "banana" or "internet" that are nearly identical across human languages.

"At first, we worried that too much information would be 'lost in translation' and that different models wouldn't be able to collaborate effectively," says Timor. "But we were wrong. Our algorithms speed up the performance of LLMs by up to 2.8 times, leading to massive savings in spending on processing power."

The significance of this research has been recognized by ICML organizers, who selected the study for public presentation—a distinction granted to only about 1% of the 15,000 submissions received this year. "We have solved a core inefficiency in generative AI," says Oren Pereg, a senior researcher at Intel Labs and co-author of the study. "This isn't just a theoretical improvement; these are practical tools that are already helping developers build faster and smarter applications."

In the past several months, the team released their algorithms on the open-source AI platform Hugging Face Transformers, making them freely available to developers around the world. The algorithms have since become part of standard tools for running efficient AI processes.

"This new development is especially important for edge devices, from phones and drones to autonomous cars, which must rely on limited computing power when not connected to the internet," Timor adds. "Imagine, for example, a self-driving car that is guided by an AI model. In this case, a faster model can make the difference between a safe decision and a dangerous error."

Also participating in the study were Dr. Jonathan Mamou, Daniel Korat, Moshe Berchansky and Moshe Wasserblat from Intel Labs and Gaurav Jain from d-Matrix. Prof. David Harel is the incumbent of the William Sussman Professorial Chair of Mathematics.

More information: Nadav Timor et al, Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies, arXiv (2025). DOI: 10.48550/arxiv.2502.05202

Journal information: arXiv Provided by Weizmann Institute of Science Citation: Faster, smarter, more open: Study shows new algorithms accelerate AI models (2025, July 16) retrieved 16 July 2025 from https://techxplore.com/news/2025-07-faster-smarter-algorithms-ai.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Approach improves how new skills are taught to large language models 0 shares

Feedback to editors

Faster, smarter, more open: Study shows new algorithms accelerate AI models

Urgent need for ‘global approach’ on AI regulation: UN tech chief

China urges global consensus on balancing AI development, security

Related Posts

Urgent need for ‘global approach’ on AI regulation: UN tech chief

China urges global consensus on balancing AI development, security

Trump’s AI plan calls for massive data centers. Here’s how it may affect energy in the US

Tradition meets AI in Nishijinori weaving style from Japan’s ancient capital

AI tackles notoriously complex equations, enabling faster advances in drug and material design

AI will soon be able to audit all published research—what will that mean for public trust in science?

A human-inspired pathfinding approach to improve robot navigation

Recent News

Users Are Unstaking Their ETH in Unusual Amounts on Ethereum – What Does This Mean and Why Is It Happening? Cathie Wood Weighs In

Urgent need for ‘global approach’ on AI regulation: UN tech chief

Bitcoin Cash Surges Past $580 as Analysts Predict Breakout Toward $620–$680 Range

Nasdaq-Listed Company Announces XRP Reserve – But Doubts Remain

TOP News

Bitcoin Sees Long-Term Holders Sell As Short-Term Buyers Step In – Sign Of Rally Exhaustion?

The AirPods 4 are still on sale at a near record low price

Ripple Partners With Ctrl Alt to Expand Custody Footprint Into Middle East

Cyberpunk 2077: Ultimate Edition comes to the Mac on July 17

HBO confirms The Last of Us season 3 will arrive in 2027

Welcome Back!

Retrieve your password