When AI draws our words: Study finds image generators fail basic instructions despite aesthetic success

November 12, 2025

The GIST When AI draws our words: Study finds image generators fail basic instructions despite aesthetic success

Gaby Clark

scientific editor

Robert Egan

associate editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

proofread

When AI draws our words
Output from the prompt: "photographic image of a woman refusing an apple offered to her by a man"—Left : DALL•E. | Right : Midjourney—May 2024 | ️ DALL•E. & Midjourney. Credit: University de Liege

Can we really trust artificial intelligence to illustrate our ideas? A team of scientists has examined the capabilities of Midjourney and DALL·E—two Generative Artificial Intelligence (GAI) software programs—to produce images from simple sentences. The verdict is mixed… between aesthetic feats and beginner's mistakes, machines still have a long way to go.

Since the emergence of GAIs such as Midjourney and DALL·E, creating images from simple sentences has become a fascinating, and sometimes even disturbing, reality. Yet behind this technical feat lies an essential question: how do these machines translate words into visuals? This is what four researchers from the University of Liège, the University of Lorraine and EHESS sought to understand by conducting an interdisciplinary study combining semiotics, computer science and art history.

The paper is published in the journal Semiotic Review.

"Our approach is based on a series of rigorous tests," explains Maria Giulia Dondero, semiotician at the University of Liège. "We submitted very specific requests to these two AI systems and analyzed the images produced according to criteria from the humanities, such as the arrangement of shapes, colors, gazes, the specific dynamism of the still image, the rhythm of its deployment, etc."

The result? AI systems are capable of generating images that are supposedly aesthetic, but often struggle to follow even the simplest instructions.

The study reveals surprising difficulties, such as the fact that GAIs do not understand negation well ("a dog without a tail" shows a dog with a tail or a frame that hides it), complex spatial relationships, the correct positioning of elements, or the rendering of consistent gaze and distance relationships ("two women behind a door"). They sometimes translate simple actions such as "fighting" into dance scenes, and struggle to represent temporal sequences such as the beginnings and ends of gestures ("starting to eat" or "having finished eating").

"These GAIs allow us to reflect on our own way of seeing and representing the world," says Enzo D'Armenio, former researcher at ULiège, junior professor at the University of Lorraine and lead author of the article. "They reproduce visual stereotypes from their databases, often constructed from Western images, and reveal the limitations of translation between verbal and visual language."

Repeat, validate and analyze

The results obtained by the research team were validated by repetition—up to fifty generations per prompt—in order to establish their statistical robustness. The models also have distinct aesthetic signatures. Midjourney favors "aestheticized" renderings, with artifacts or textures that embellish the image, sometimes at the expense of strict instruction respect, while DALL·E, which is more "neutral" in terms of texture, offers greater compositional control but can vary more in terms of orientation or number of objects.

The series of 50 tests on the prompt "three vertical white lines on a black background" illustrate these trends: relative consistency but frequent artifacts for Midjourney; variability in the number and orientation of lines for DALL·E.

When AI draws our words: Study finds image generators fail basic instructions despite aesthetic success
Midjourney 6. Results of the prompt : 'three vertical white lines on a black background', February 2025, repeated 50 times, to validate our observations on a smaller scale. Credit: Midjourney

The study points out that these AIs are statistical. "GAIs produce the most plausible result based on their training databases and the (sometimes editorial) settings of their designers," explains Adrien Deliège, a mathematician at ULiège. "These choices might standardize the gaze and convey or reorient stereotypes."

A telling example: given the prompt "CEO giving a speech," DALL·E may generate mostly women, while other models produce almost exclusively middle-aged white men, a sign that the imprint of designers and datasets influences the machine's "vision" of the world.

Researchers emphasize that evaluating these technologies requires more than just measuring their statistical effectiveness; it also necessitates using tools from the humanities to understand their cultural and symbolic functioning.

"AI tools are not simply automatic tools," concludes Enzo D'Armenio. "They translate our words according to their own logic, influenced by their databases and algorithms. The humanities have an essential role to play in understanding and evaluating them."

And while these AI tools can already help us illustrate our ideas, they still have a long way to go before they can translate them perfectly.

More information: Enzo D'Armenio et al, For a Semiotic Approach to Generative Image AI, Semiotic Review (2025). DOI: 10.71743/ee5nrx33

Provided by University de Liege Citation: When AI draws our words: Study finds image generators fail basic instructions despite aesthetic success (2025, November 12) retrieved 12 November 2025 from https://techxplore.com/news/2025-11-ai-words-image-generators-basic.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Historical images made with AI recycle colonial stereotypes and bias—new research

Feedback to editors