February 12, 2025
The GIST Editors' notes
This text has been reviewed in line with Science X's editorial course of and insurance policies. Editors have highlighted the next attributes whereas making certain the content material's credibility:
fact-checked
peer-reviewed publication
trusted supply
proofread
The restrictions of language: AI fashions nonetheless lag behind people in easy textual content comprehension assessments

A global analysis group led by the URV has analyzed the capabilities of seven synthetic intelligence (AI) fashions in understanding language and in contrast them with these of people.
The outcomes, revealed within the journal Scientific Experiences, present that, regardless of their success in some particular duties, the fashions don’t obtain a degree corresponding to that of people in easy textual content comprehension assessments.
"The power of fashions to hold out advanced duties doesn’t assure that they’re competent in easy duties," warned the researchers.
Giant language fashions (LLMs) are neural networks designed to generate texts autonomously from a consumer request. They concentrate on duties similar to producing solutions to common queries, translating texts, fixing issues and summarizing content material.
It’s typically claimed that these fashions have capabilities just like these of people, by way of comprehension and reasoning, however the outcomes of the analysis led by Vittoria Dentella, a researcher on the URV's Language and Linguistics Analysis Group, present their limitations: "LLMs do probably not perceive language, however merely make the most of the statistical patterns current of their coaching knowledge."
Neural networks are computational fashions that emulate the organic neural buildings of the mind. They encompass a collection of interconnected nodes, referred to as synthetic neurons. Every node receives data from the opposite neurons, processes it and sends it on. Seen from the skin, a neural community accepts an enter, processes it and returns a outcome.
Researchers have to coach the community with data they’re conversant in in order that it mechanically learns to course of knowledge to supply the anticipated response. As soon as skilled, they’re utilized in prediction duties, knowledge classification and filtering, sample recognition, and many others.
With a purpose to evaluate the efficiency of people and LLMs in textual content comprehension, the researchers put 40 inquiries to seven AI fashions (Bard, ChatGPT-3.5, ChatGPT-4, Falcon, Gemini, Llama2 and Mixtral), utilizing easy grammatical buildings and incessantly used verbs.
On the similar time, a gaggle of 400 folks, all native English audio system, had been requested the identical questions and the accuracy of their solutions was in contrast with these of the LLMs. Every query was repeated thrice to evaluate the consistency of the solutions.
The common human accuracy was 89%, far increased than that of the AI fashions, the very best of which (ChatGPT-4) supplied 83% appropriate solutions.
The outcomes present a giant distinction within the efficiency of the textual content comprehension applied sciences: except for ChatGPT-4, not one of the LLMs achieved greater than 70% accuracy. People had been additionally extra constant when confronted with repeated questions, sustaining solutions in 87% of instances. Within the case of the textual content comprehension fashions, however, this was between 66% and 83%
"Though LLMs can generate grammatically appropriate and apparently coherent texts, the outcomes of this research recommend that, in the long run, they don’t perceive the which means of language in the best way a human does," explains Dentella.
In actuality, prolonged language fashions don’t interpret which means in the best way an individual does—by means of a mixture of semantic, grammatical, pragmatic and contextual components. They work by figuring out patterns within the texts they’ve been given, evaluating them with the patterns within the data that was used to coach them after which utilizing statistically-based predictive algorithms. Their obvious humanness is, due to this fact, an phantasm.
The LLMs' lack of information can forestall them from giving constant solutions, particularly when they’re subjected to repeated questions, because the research discovered. It additionally explains why the fashions can present solutions that aren’t solely incorrect, however which additionally point out that they haven’t understood the context or which means of an idea.
This in flip means, Dentella warns, that the expertise is just not but dependable sufficient for use in sure vital purposes: "Our analysis reveals that the flexibility of LLMs to hold out advanced duties doesn’t assure that they’re competent in easy duties, which are sometimes those who require an actual understanding of language."
Extra data: Vittoria Dentella et al, Testing AI on language comprehension duties reveals insensitivity to underlying which means, Scientific Experiences (2024). DOI: 10.1038/s41598-024-79531-8
Journal data: Scientific Reports Offered by College of Rovira i Virgili Quotation: The restrictions of language: AI fashions nonetheless lag behind people in easy textual content comprehension assessments (2025, February 12) retrieved 12 February 2025 from https://techxplore.com/information/2025-02-limitations-language-ai-lag-humans.html This doc is topic to copyright. Aside from any honest dealing for the aim of personal research or analysis, no half could also be reproduced with out the written permission. The content material is supplied for data functions solely.
Discover additional
A elementary revision of how AI acquires and processes language may lead to more practical LLMs 2 shares
Feedback to editors
