Do chatbots have a moral compass? Researchers turn to Reddit to find out

September 11, 2025

The GIST Do chatbots have a moral compass? Researchers turn to Reddit to find out

OpenAI, DeepSeek, and Google vary widely in identifying hate speech

September 11, 2025

Artificial intelligence enables exoskeletons to assist users more efficiently

September 11, 2025

Sadie Harley

scientific editor

Andrew Zinin

lead editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

By challenging AI chatbots to judge thousands of moral dilemmas posted in a popular Reddit forum, UC Berkeley researchers revealed that each platform appears to follow its own set of ethics.

More and more people are turning to ChatGPT or other AI chatbots for advice and emotional support, and it's easy to see why. Unlike a friend or a therapist, a chatbot is always available, listens to everything you have to say, and provides responses that are often thoughtful and validating.

But confiding in chatbots can be risky. Many of these technologies are designed primarily to drive engagement, and may provide users with responses that are false or harmful. And unlike a friend or therapist, the output of a chatbot reflects the norms and biases of the algorithm's dataset, which could differ from those of your social group or community.

With many people seeking advice from chatbots, these unknown norms and biases could have surprising impacts on human behavior and society at large.

"Through their advice and feedback, these technologies are shaping how humans act, what they believe and what norms they adhere to," said Pratik Sachdeva, a senior data scientist at UC Berkeley's D-Lab. "But many of these tools are proprietary. We don't know how they were trained. We don't know how they are aligned."

To start to reveal the hidden norms encoded in popular AI chatbots and how they might impact human behavior, Sachdeva and Tom van Nuenen, a senior data scientist and lecturer at the D-Lab, turned to the internet's favorite source of moral dilemmas: Reddit's "Am I the Asshole?" (or AITA) forum.

In a study, published as a pre-print on arXiv, Sachdeva and Van Nuenen confronted each of seven different large language models (LLMs)—the AI systems that power chatbots—with more than 10,000 real-world social conflicts posted to the forum, asking them each to decide who was at fault in each situation and comparing their responses to those of Reddit users.

They found that the seven chatbots often showed striking differences in how they judged the Reddit users' moral dilemmas, revealing that each LLM reflects different ethical standards. However, when they compared their judgments with those of Reddit users, or Redditors, they found that the consensus opinion of the seven chatbots usually agreed with the consensus opinion of people on Reddit.

"When you have a dilemma, you might ask a series of different friends what they think, and each of them might give you a different opinion. In essence, this is what Reddit users are doing on the AITA forum," Sachdeva said.

"You could do the same thing with chatbots—first, you ask ChatGPT, then you ask Claude and then you ask Gemini. When we did that, we found that there was consistency between the majority opinions of Redditors and the majority opinion of chatbots."

On the AITA forum, Redditors share everyday interpersonal conflicts, ranging from broken promises to privacy violations, and other users discuss whether the original poster was morally at fault in the situation.

Respondents share their reasoning along with standard phrases, including "You are the Asshole," "Not the Asshole," "No Assholes here," "Everyone's the Asshole," and "More information needed." The response that receives the most upvotes is considered the final verdict.

"'Am I the Asshole?" is a useful antidote to the very structured moral dilemmas that we see in a lot of academic research," Van Nuenen said. "The situations are messy, and it's that messiness that we wanted to confront large language models with."

The standardized response phrases also make it easy to evaluate chatbots' moral judgments and compare them with each other and with actual Reddit users, Van Nuenen said.

In the study, Sachdeva and Van Nuenen consulted seven LLMs, including OpenAI's GPT-3.5 and GPT-4; Claude Haiku; Google's PaLM 2 Bison and Gemma 7B; Meta's LLaMa 2 7B; and Mistral 7B. For each AITA scenario, the researchers requested that the LLM provide both a standardized response and a short description of its reasoning.

Though the models often disagreed with each other, they were generally very self-consistent, meaning that when the researchers posed a model with the same dilemma multiple times, it tended to provide the same answer each time. This suggests that the models are not responding randomly, but are in fact encoding different norms and values.

To start to tease apart these differences in moral reasoning, the researchers analyzed the LLMs' written responses, paying attention to how sensitive each model was to six broad moral themes, including fairness, feelings, harms, honesty, relational obligation and social norms.

"We found that ChatGPT-4 and Claude are a little more sensitive to feelings relative to the other models, and that a lot of these models are more sensitive to fairness and harms, and less sensitive to honesty," Sachdeva said.

That could mean that when assessing a conflict, it might be more likely to take the side of someone who was dishonest than someone who caused harm. "We're still laying the groundwork, but in future work we hope to actually identify some salient trends."

Interestingly, they found that Mistral 7B relied heavily on the "No assholes here" label, not necessarily because it thought no one was at fault, but because it was taking the term "asshole" more literally than other models.

"Its own internalization of the concept of assholes was very different from the other models, which raises interesting questions about a model's ability to pick up the norms of the subreddit," Sachdeva said.

In a follow-up study, Sachdeva and Van Nuenen are exploring how chatbots deliberate with each other on moral dilemmas. Their preliminary findings indicate that models have different approaches to conforming and reaching consensus. For example, the GPT models were less likely to change their assignment of blame in moral dilemmas when given pushback from other models. They also refined their analysis of values, finding that different models relied on different values to make their arguments.

As Sachdeva and Van Nuenen continue studying the inner workings of major AI models and advocating for more transparency in AI design and development, they hope their research also highlights the importance of being mindful of how we all use the technology—and the sneaky ways that it might be influencing us.

"We want people to be actively thinking about why they are using LLMs, when they are using LLMs and if they are losing the human element by relying on them too much," Sachdeva said.

"Thinking about how LLMs might be reshaping our behavior and beliefs is something only humans can do."

More information: Pratik S. Sachdeva et al, Normative Evaluation of Large Language Models with Everyday Moral Dilemmas, arXiv (2025). DOI: 10.48550/arxiv.2501.18081

Journal information: arXiv Provided by University of California – Berkeley Citation: Do chatbots have a moral compass? Researchers turn to Reddit to find out (2025, September 11) retrieved 11 September 2025 from https://techxplore.com/news/2025-09-chatbots-moral-compass-reddit.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Seeking moral advice from large language models comes with risk of hidden biases

Feedback to editors