Self-adapting LLMs behave more like students to absorb new knowledge

November 12, 2025

The GIST Self-adapting LLMs behave more like students to absorb new knowledge

Sadie Harley

scientific editor

Robert Egan

associate editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

Self-adapting LLMs behave more like students to absorb new knowledge
Credit: AI-generated image

In an MIT classroom, a professor lectures while students diligently write down notes they will reread later to study and internalize key information ahead of an exam.

Humans know how to learn new information, but large language models can't do this in the same way. Once a fully trained LLM has been deployed, its "brain" is static and can't permanently adapt itself to new knowledge.

This means that if a user tells an LLM something important today, it won't remember that information the next time this person starts a new conversation with the chatbot.

Now, a new approach developed by MIT researchers enables LLMs to update themselves in a way that permanently internalizes new information. Just like a student, the LLM generates its own study sheets from a user's input, which it uses to memorize the information by updating its inner workings. The work is published on the arXiv preprint server.

The model generates multiple self-edits to learn from one input, then applies each one to see which improves its performance the most. This trial-and-error process teaches the model the best way to train itself.

The researchers found this approach improved the accuracy of LLMs at question-answering and pattern-recognition tasks, and it enabled a small model to outperform much larger LLMs.

While there are still limitations that must be overcome, the technique could someday help artificial intelligence agents consistently adapt to new tasks and achieve changing goals in evolving environments.

"Just like humans, complex AI systems can't remain static for their entire lifetimes. These LLMs are not deployed in static environments. They are constantly facing new inputs from users. We want to make a model that is a bit more human-like—one that can keep improving itself," says Jyothish Pari, an MIT graduate student and co-lead author of the paper on this technique.

He is joined on the paper by co-lead author Adam Zweiger, an MIT undergraduate; graduate students Han Guo and Ekin Akyürek; and senior authors Yoon Kim, an assistant professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, an assistant professor in EECS and member of CSAIL.

The research will be presented at the Conference on Neural Information Processing Systems.

Teaching the model to learn

LLMs are neural network models that have billions of parameters, called weights, that contain the model's knowledge and process inputs to make predictions. During training, the model adapts these weights to learn new information contained in its training data.

But once it is deployed, the weights are static and can't be permanently updated anymore.

However, LLMs are very good at a process called in-context learning, in which a trained model learns a new task by seeing a few examples. These examples guide the model's responses, but the knowledge disappears before the next conversation.

The MIT researchers wanted to leverage a model's powerful in-context learning capabilities to teach it how to permanently update its weights when it encounters new knowledge.

The framework they developed, called SEAL for "self-adapting LLMs," enables an LLM to generate new synthetic data based on an input, and then determine the best way to adapt itself and learn from that synthetic data. Each piece of synthetic data is a self-edit the model can apply.

Teaching large language models how to absorb new knowledge
Overview of SEAL. In each RL outer loop iteration, the model generates candidate self-edits (SE)—directives on how to update the weights—applies updates, evaluates performance on a downstream task, and uses the resulting rewards to improve the self-edit generation policy. Credit: arXiv (2025). DOI: 10.48550/arxiv.2506.10943

In the case of language, the LLM creates synthetic data by rewriting the information, and its implications, in an input passage. This is similar to how students make study sheets by rewriting and summarizing original lecture content.

The LLM does this multiple times, then quizzes itself on each self-edit to see which led to the biggest boost in performance on a downstream task like question answering. It uses a trial-and-error method known as reinforcement learning, where it receives a reward for the greatest performance boost.

Then the model memorizes the best study sheet by updating its weights to internalize the information in that self-edit.

"Our hope is that the model will learn to make the best kind of study sheet—one that is the right length and has the proper diversity of information—such that updating the model based on it leads to a better model," Zweiger explains.

Choosing the best method

Their framework also allows the model to choose the way it wants to learn the information. For instance, the model can select the synthetic data it wants to use, the rate at which it learns, and how many iterations it wants to train on.

In this case, not only does the model generate its own training data, but it also configures the optimization that applies that self-edit to its weights.

"As humans, we know how we learn best. We want to grant that same ability to large language models. By providing the model with the ability to control how it digests this information, it can figure out the best way to parse all the data that are coming in," Pari says.

SEAL outperformed several baseline methods across a range of tasks, including learning a new skill from a few examples and incorporating knowledge from a text passage. On question answering, SEAL improved model accuracy by nearly 15% and on some skill-learning tasks, it boosted the success rate by more than 50%.

But one limitation of this approach is a problem called catastrophic forgetting: As the model repeatedly adapts to new information, its performance on earlier tasks slowly declines.

The researchers plan to mitigate catastrophic forgetting in future work. They also want to apply this technique in a multi-agent setting where several LLMs train each other.

"One of the key barriers to LLMs that can do meaningful scientific research is their inability to update themselves based on their interactions with new information. Though fully deployed self-adapting models are still far off, we hope systems able to learn this way could eventually overcome this and help advance science," Zweiger says.

More information: Adam Zweiger et al, Self-Adapting Language Models, arXiv (2025). DOI: 10.48550/arxiv.2506.10943

Journal information: arXiv Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Self-adapting LLMs behave more like students to absorb new knowledge (2025, November 12) retrieved 12 November 2025 from https://techxplore.com/news/2025-11-llms-students-absorb-knowledge.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Test-time training could lead to LLMs that are better at complex reasoning

Feedback to editors