June 25, 2025 feature
The GIST Multimodal LLMs and the human brain create object representations in similar ways, study finds
Ingrid Fadelli
contributing writer
Lisa Lock
scientific editor
Robert Egan
associate editor
Editors' notes
This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
fact-checked
peer-reviewed publication
trusted source
proofread

A better understanding of how the human brain represents objects that exist in nature, such as rocks, plants, animals, and so on, could have interesting implications for research in various fields, including psychology, neuroscience and computer science. Specifically, it could help shed new light on how humans interpret sensory information and complete different real-world tasks, which could also inform the development of artificial intelligence (AI) techniques that closely emulate biological and mental processes.
Multimodal large language models (LLMs), such as the latest models underpinning the functioning of the popular conversational platform ChatGPT, have been found to be highly effective computational techniques for the analysis and generation of texts in various human languages, images and even short videos.
As the texts and images generated by these models are often very convincing, to the point that they could appear to be human-created content, multimodal LLMs could be interesting experimental tools for studying the underpinnings of object representations.
Researchers at the Chinese Academy of Sciences recently carried out a study aimed at better understanding how multimodal LLMs represent objects, while also trying to determine whether the object representations that emerge in these models resemble those observed in humans. Their findings are published in Nature Machine Intelligence.
"Understanding how humans conceptualize and categorize natural objects offers critical insights into perception and cognition," Changde Du, Kaicheng Fu and their colleagues wrote in their paper. "With the advent of large language models (LLMs), a key question arises: Can these models develop human-like object representations from linguistic and multimodal data?
"We combined behavioral and neuroimaging analyses to explore the relationship between object concept representations in LLMs and human cognition."

As part of their study, the researchers specifically examined the object representations emerging in the LLM ChatGPT- 3.5 created by Open AI, and in the multi-modal LLM GeminiPro Vision 1.0 developed at Google DeepMind. They asked these models to complete simple tasks known as triplet judgments. For each of these tasks, they were presented with three objects and asked to select the two that more closely resembled each other.
"We collected 4.7 million triplet judgments from LLMs and multimodal LLMs to derive low-dimensional embeddings that capture the similarity structure of 1,854 natural objects," wrote Du, Fu and their colleagues. "The resulting 66-dimensional embeddings were stable, predictive and exhibited semantic clustering similar to human mental representations. Remarkably, the dimensions underlying these embeddings were interpretable, suggesting that LLMs and multimodal LLMs develop human-like conceptual representations of objects."
Using the large dataset of triplet judgments that they collected, the researchers computed low-dimensional embeddings. These are mathematical representations that outline the similarity between objects over various dimensions, placing similar objects closer to each other in an abstract space.
Notably, the researchers observed that the low-dimensional embeddings they attained reliably grouped objects into meaningful categories, such as "animals," "plants," and so on. They thus concluded that LLMs and multi-modal LLMs naturally organize objects similarly to how they are represented and categorized in the human mind.
"Further analysis showed strong alignment between model embeddings and neural activity patterns in brain regions such as the extra-striate body area, para-hippocampal place area, retro-splenial cortex and fusiform face area," the team wrote. "This provides compelling evidence that the object representations in LLMs, although not identical to human ones, share fundamental similarities that reflect key aspects of human conceptual knowledge."
Overall, the results gathered by Du, Fu and their colleagues suggest that human-like natural object representations could inherently emerge in LLMs and multi-modal LLMs after they are trained on large amounts of data. In the future, this study could inspire other research teams to explore how LLMs represent objects, while also potentially contributing to the further advancement of brain-inspired AI systems.
Written for you by our author Ingrid Fadelli, edited by Lisa Lock , and fact-checked and reviewed by Robert Egan —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You'll get an ad-free account as a thank-you.
More information: Changde Du et al, Human-like object concept representations emerge naturally in multimodal large language models, Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01049-z
Journal information: Nature Machine Intelligence
© 2025 Science X Network
Citation: Multimodal LLMs and the human brain create object representations in similar ways, study finds (2025, June 25) retrieved 25 June 2025 from https://techxplore.com/news/2025-06-multimodal-llms-human-brain-representations.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
Explore further
LLMs are becoming more brain-like as they advance, researchers discover 9 shares
Feedback to editors