July 9, 2025
The GIST Practical changes could reduce AI energy demand by up to 90%
Lisa Lock
scientific editor
Andrew Zinin
lead editor
Editors' notes
This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
fact-checked
trusted source
proofread

Artificial intelligence (AI) can be made more sustainable by making practical changes, such as reducing the number of decimal places used in AI models, shortening responses, and using smaller AI models, according to research from UCL published in a new UNESCO report.
In recent years, the use of generative AI has expanded rapidly, with large language models (LLMs) developed by companies such as OpenAI, Meta and Google becoming household names. For example, OpenAI's ChatGPT service, powered by the GPT-4 LLM, receives about 1 billion queries each day.
Each generation of LLMs has become more sophisticated than the last, better able to perform tasks like text generation or knowledge retrieval. This has led to a vast and increasing demand on resources such as electricity and water, which are needed to run the data centers where these AI models are trained and deployed.
The report, which will be presented this week at the AI for Good Global Summit in Geneva, assesses the potential impact of existing solutions to the problem that, if adopted more widely, could significantly reduce AI's energy and resource demand.
Researchers from UCL Computer Science conducted a series of experiments on Meta's LLaMA 3.1 8B model to assess the impact of changes to the way AI models are configured and used on how much energy they need, and how this affects performance. This model was chosen as it is open source and fully modifiable, enabling the researchers to test the un-optimized version versus a range of optimization techniques (which is not possible with closed models like GPT-4).
They found that by rounding down numbers used in the models' internal calculations, shortening user instructions and AI responses, and using smaller AI models specialized to perform certain tasks, a combined energy reduction of 90% could be achieved compared to using a large all-purpose AI model.
Professor Ivana Drobnjak, an author of the report from UCL Computer Science and a member of the UNESCO Chair in AI at UCL said, "Our research shows that there are relatively simple steps we can take to drastically reduce the energy and resource demands of generative AI, without sacrificing accuracy and without inventing entirely new solutions.
"Though some AI platforms are already exploring and implementing solutions such as the ones we propose, there are many others besides the three that we looked at. Wholesale adoption of energy-saving measures as standard would have the greatest impact."
Rounding down to save energy
In the first experiment, the researchers assessed the accuracy of Meta's LLaMA 3.1 8B model when performing common tasks (summarizing texts, translating languages and answering general knowledge questions), alongside its energy usage, under different conditions.
In a process called tokenization, LLMs convert the words from the user's prompt into numbers (tokens), which are used to perform the calculations involved in the task, before converting the numbers back into words to provide a response.
By applying a method called quantization (using fewer decimal places to round down the numbers used in calculations), the energy usage of the model dropped by up to 44% while maintaining at least 97% accuracy compared to the baseline. This is because it is easier to get to the answer, in much the same way as most people could calculate two plus two much more quickly than calculating 2.34 plus 2.17, for example.
The team also compared LLaMA 3.1 8B to smaller AI models built to specialize in each of the three tasks. Small models used 15 times less energy for summarization, 35 times less energy for translation and 50 times less energy for question answering.
Accuracy was comparable to the larger model, with small model performing 4% more accurately for summarization, 2% for translation and 3% for question answering.
Shortening questions and responses
In the second experiment, the researchers assessed the impact on energy usage of changing the length of the user's prompt (instructions) and the model's response (answer).
They calculated energy consumption for 1,000 scenarios, varying the length of the user prompt and the model's response from approximately 400 English words down to 100 English words.
The longest combination (400-word prompt and 400-word response) used 1.03 kilo-watt hours (kWh) of electricity, enough to power a 100-watt lightbulb for 10 hours or a fridge-freezer for 26 hours.
Halving the user prompt length to 200 words reduced the energy expenditure by 5%, while halving the model response length to 200 words reduced energy consumption by 54%.
Assessing real-world impact
To assess the global impact of the optimizations tested, the authors asked LLaMA 3.1 8B to provide an answer to a specific question. They then calculated the energy required for it to do so, multiplied by the estimated daily number of requests for this sort of task by users of popular AI service ChatGPT4.
They estimated that using quantization, combined with cutting down user prompt and AI response length from 300 to 150 words, could reduce energy consumption by 75%.
In a single day, this saving would be equivalent to the amount of electricity needed to power 30,000 average UK households (assuming 7.4 kilowatt hours per house per day). Importantly, this saving would be achieved without the model losing the ability to address more complex general tasks.
For repetitive tasks such as translation and summarization, the biggest savings were achieved by using small, specialized models and a reduced prompt/response length, which reduced energy usage by over 90% (enough to power 34,000 UK households for a day).
Hristijan Bosilkovski, an author of the report and a UCL MSc graduate in Data Science and Machine Learning, said, "There will be times when it makes sense to use a large, all-purpose AI model, such as for complex tasks or research and development.
"But the biggest gains in energy efficiency can be achieved by switching from large models to smaller, specialized models in certain tasks such as translation or knowledge retrieval. It's a bit like using a hammer to drive a nail, rather than a sledgehammer."
Looking to the future
The authors of the report say that as competition in generative AI models increases, it will become more important for companies to streamline models, as well as using smaller models better suited to certain tasks.
Leona Verdadero, an author of the report and a Program Specialist from UNESCO's Digital Policies and Digital Transformation Section, said, "Too often, users rely on oversized AI models for simple tasks, it's like using a fire hose to water a house plant. By sharing practical techniques more broadly, we can empower people to make smarter, more intentional choices. Matching the appropriate-sized model to the job isn't just more efficient, it's essential to making the AI revolution both sustainable and accessible."
Dr. Maria Perez Ortiz, an author of the report from UCL Computer Science and a member of the UNESCO Chair in AI at UCL, said, "The future of generative AI models lies in efficiency, not excess. We want to solve challenges with smarter models, not necessarily by consuming more resources. The strategies proposed in our report not only reduce the energy expenditure and improve model speed, but they also require considerably less computational power and resources.
"They are readily accessible, and some are already used for this purpose by the newer and upcoming generation of AI models."
Professor Drobnjak added, "When we talk about the future of resource-efficient AI, I often use two metaphors. One is a 'collection of brains," lots of separate specialist models that pass messages back and forth, which can save energy but feel fragmented. The other metaphor, and the future that I'm most excited about, looks more like a single brain with distinct regions, which is tightly connected, sharing one memory, yet able to switch on only the circuits it needs. It's like bringing the efficiency of a finely tuned cortex to generative AI: smarter, leaner, and far less resource hungry."
Provided by University College London Citation: Practical changes could reduce AI energy demand by up to 90% (2025, July 9) retrieved 9 July 2025 from https://techxplore.com/news/2025-07-ai-energy-demand.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
Explore further
Some AI prompts could cause 50 times more CO₂ emissions than others, researchers find 0 shares
Feedback to editors