Where AI models fall short in mimicking the expressiveness of human speech

September 5, 2025

The GIST Where AI models fall short in mimicking the expressiveness of human speech

Large language models can execute complete ransomware attacks autonomously, research shows

September 5, 2025

Retraining AI to fortify itself against rogue rewiring even after key layers are removed

September 5, 2025

Lisa Lock

scientific editor

Andrew Zinin

lead editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

proofread

Where AI models fall short in mimicking the expressiveness of human speech — Through the Penn Undergraduate Research Mentoring Program, students Ethan Yang, Kevin Li, and Henry Huang worked with linguistics professor Jianjing Kuang to study the ability of AI models to replicate the expressiveness of human speech. Credit: University of Pennsylvania

It's not just what is said but how it's articulated that shapes the meaning of human communication, and people use intonation to highlight the most important part of a sentence. Take, for instance, the sentence "Molly mailed a melon." If someone asks, "Who mailed the melon?" people are most likely to stress "Molly mailed a melon." If someone inquired what Molly did with the melon, it would be "Molly mailed a melon." If the question was what Molly mailed, the response is "Molly mailed a melon."

But if you ask any of these questions to an artificial intelligence model that is capable of speech, it's a different story. Jianjing Kuang, associate professor of linguistics in the School of Arts & Sciences and director of the Penn Phonetics Laboratory, says while AI robots can articulate a word accurately, the technology to capture intonation, known as prosodic focus, "is not quite there yet."

This summer, she mentored three undergraduate students—Kevin Li and Henry Huang, second-year computer science students and Ethan Yang, a third-year mechanical engineering major—in a research project comparing human and AI speech in speech production and perception. This is part of the Penn Undergraduate Research Mentoring Program (PURM), a 10-week summer research opportunity through the Center for Undergraduate Research and Fellowships that comes with a $5,000 award.

"I've always been interested in linguistics and phonetics, but this is a really good opportunity for me to do hands-on research," says Li, who is from Kansas City, Kansas. Huang, who is from Shenzhen, China, says the experience taught him how to design an experiment and analyze data.

Inputting different contexts, the students generated the sentence "Molly mailed a melon" in 15 AI text-to-speech (TTS) platforms—from major companies like OpenAI, Google, and Meta to smaller ones like Sesame AI and Eleven Labs. They also captured audio from human volunteers in Kuang's recording studio to compare AI-generated speech to the same speech from humans.

Yang, a third-year mechanical engineering major from Diamond Bar, California, says this project taught him how to control intonation in TTS models. The team then analyzed acoustic measures such as pitch, intensity, and duration of words using the software Praat.

They found that, compared to human production, most of the TTS models failed to focus on the correct place. As an example, Li pulled up a graph showing that when prompted to focus on the word "mailed," the average word duration is significantly longer from humans than from any of the speech robots.

The students found "huge variability among the models," Kuang says. Some models were explicitly instructed to emphasize a certain word but could not, while others, such as OpenAI and Google Gemini, were more capable. Some models emphasized more than one word, one turned the sentence into a question mark, and another didn't even finish the sentence. Another interesting finding, Kuang says, is that speech robots had an easier time emphasizing "Molly" than words later in the sentence.

In addition to speech production, the students ran a perception experiment, asking human listeners to rate the naturalness of an audio clip and identify whether the speaker is human or AI. Kuang says the accuracy for identifying human versus AI is very high, suggesting that AI speech is still not human-like.

"The goal is to build bridges between science and industry. I do think they need us—our knowledge—to tell how good the model is and help move us closer to truly natural and expressive AI speech," she says. Kuang adds that working with AI also has implications for better understanding human speech and its uniqueness, such as why certain tasks come easily to us and how to develop better therapies for speech disorders.

Provided by University of Pennsylvania Citation: Where AI models fall short in mimicking the expressiveness of human speech (2025, September 5) retrieved 5 September 2025 from https://techxplore.com/news/2025-09-ai-fall-short-mimicking-human.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Bar chatter: Automatic speech recognition rivals humans in noisy environments shares

Feedback to editors

Where AI models fall short in mimicking the expressiveness of human speech

Large language models can execute complete ransomware attacks autonomously, research shows

Retraining AI to fortify itself against rogue rewiring even after key layers are removed

Related Posts

Large language models can execute complete ransomware attacks autonomously, research shows

Retraining AI to fortify itself against rogue rewiring even after key layers are removed

Europe’s fastest supercomputer to boost AI drive

Similarities between human and AI learning offer intuitive design insights

Researchers discover a GPU vulnerability that could threaten AI models

RoboBallet system enables robotic arms to work together like a well-choreographed dance

OpenAI looks to online advertising deal. AI-driven ads will be hard for consumers to spot

Recent News

Anthropic will pay a record-breaking $1.5 billion to settle copyright lawsuit with authors

Zuckerberg caught on hot mic telling Trump ‘I wasn’t sure’ how much to promise to spend on AI in the US

BREAKING: Bullish News for Cryptocurrencies from the US Senate

Unity developers can now tap into system screen reader tools on macOS and Windows

TOP News

Investment Giant 21Shares Announces New Five Altcoins Including Avalanche (AVAX)!

God help us, Donald Trump plans to sell a phone

WhatsApp has ads now, but only in the Updates tab

Tron Looks to go Public in the U.S., Form Strategy Like TRX Holding Firm: FT

AI generates data to help embodied agents ground language to 3D world

Welcome Back!

Retrieve your password