May 28, 2025 feature
The GIST Large language models struggle with coordination in social and cooperative games
Ingrid Fadelli
contributing writer
Sadie Harley
scientific editor
Robert Egan
associate editor
Editors' notes
This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
fact-checked
peer-reviewed publication
trusted source
proofread

Large language models (LLMs), such as the model underpinning the functioning of the popular conversational platform ChatGPT, are now widely used by people worldwide to source information, as well as to summarize, analyze and generate texts. Studies examining the responses provided by LLMs in different scenarios could help to gain a deeper understanding of their tendencies during social interactions, which could fuel their future advancement.
Researchers at Helmholtz Munich's Institute for Human-Centered AI, the Max Planck Institute for Biological Cybernetics and the University of Tübingen recently set out to examine how different LLMs behave when they are interacting with each other, specifically while playing various cooperative or competitive games. Their findings, published in Nature Human Behaviour, suggest that while LLMs do not perform very well in games that require coordination, there are ways to make their interactions while playing these games more human-like.
"The paper was inspired by a simple but important question: if LLMs are going to interact with humans and each other in real-world applications, how well do they actually understand social dynamics?" Elif Akata, first author of the paper, told Tech Xplore.
"We drew on behavioral game theory, a mathematical approach to understand how humans make strategic decisions in interactive situations and applied it to LLMs."
Many recent studies have assessed the performance of LLMs on specific tasks, such as summarizing texts or finding logical solutions to problems. Instead of evaluating the performance of these models on isolated tasks, Akata and her colleagues wished to better understand how they behave during interactions that are much closer to conversations that humans might have with each other in real-world settings.
"We let different LLMs, including GPT-4, Claude 2, and Llama 2, play hundreds of rounds of classic two-player games (e.g., the Prisoner's Dilemma and the Battle of the Sexes) with each other, with simple hand-coded strategies, or with human participants," explained Akata.
"Each game was played repeatedly to simulate ongoing interactions. We studied whether models could learn to cooperate or coordinate over time and tested how changes to the prompting structure could improve their social behavior."

The results of the tests performed by Akata and her colleagues suggest that LLMs are surprisingly good at acting in their own interest, as they performed particularly well in competitive games, such as Prisoner's Dilemma.
This is a renowned task employed in game theory research that requires two participants, or in this case, two LLMs, to imagine that they are criminals who committed a crime together and are interrogated separately by law enforcement agents, who try to persuade them to confess to evade jail time, even if this will entail a long sentence for the other participant.
While LLMs were found to respond in their own self-interest while playing this game (i.e., confessing the crime), they often performed poorly in games that require coordination, mutual understanding and compromising, such as Battle of the Sexes. This is another game that presents a situation in which romantic partners are separated and asked to choose between two activities to do together, despite having markedly different preferences.
"We also discovered that their behavior can be improved with simple interventions like prompting the model to first predict what its partner might do before acting," said Akata. "These findings suggest that current models don't yet have robust social intelligence, but they also show that there are ways to steer them toward more human-like behavior.
"The implications go beyond game theory, as our results show that we can shape LLMs into more socially aware agents, not just ones that generate correct answers, but ones that participate more meaningfully in shared tasks. Imagine an AI that doesn't just answer a question but knows when to listen, when to adapt, and how to gently steer a conversation."
Overall, the findings gathered by Akata and her colleagues suggest that current LLMs are more prone to act in their self-interest and are not great at coordinating with others. Nonetheless, the researchers identified some strategies that could help to make LLMs more cooperative and socially aware. Their paper could thus guide future efforts aimed at improving existing models or developing new ones that are more responsive to the needs and inclinations of human users.
"We'd now like to scale up to richer, more realistic social situations, for instance, by studying games with more than two players, interactions with incomplete information, or long-term engagements where models must build and maintain trust," added Akata.
"In the long term, this kind of research could help develop AI systems that are better collaborators. For example, in health care, education, or social support, success often depends on whether an AI can communicate empathy, build rapport, and act in ways that feel supportive and trustworthy to people."
More information: Elif Akata et al, Playing repeated games with large language models, Nature Human Behaviour (2025). DOI: 10.1038/s41562-025-02172-y.
Journal information: Nature Human Behaviour
© 2025 Science X Network
Citation: Large language models struggle with coordination in social and cooperative games (2025, May 28) retrieved 28 May 2025 from https://techxplore.com/news/2025-05-ai-game-theory-language-human.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
Explore further
People show less trust and cooperation when interacting with AI vs. humans 30 shares
Feedback to editors
