CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
Friday, October 24, 2025
No Result
View All Result
CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
No Result
View All Result
CRYPTOREPORTCLUB

Some language reward models exhibit political bias even when trained on factual data

December 10, 2024
152
0

December 10, 2024

Editors' notes

Related Post

A common language to describe and assess human–agent teams

A common language to describe and assess human–agent teams

October 24, 2025
Strength of gender biases in AI images varies across languages

Strength of gender biases in AI images varies across languages

October 24, 2025

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

Some language reward models exhibit political bias even when trained on factual data

Study: Some language reward models exhibit political bias
Truthful reward models exhibit a clear left-leaning bias across several commonly used datasets. Credit: MIT Center for Constructive Communication.

Large language models (LLMs) that drive generative artificial intelligence apps, such as ChatGPT, have been proliferating at lightning speed and have improved to the point that it is often impossible to distinguish between something written through generative AI and human-composed text. However, these models can also sometimes generate false statements or display a political bias.

In fact, in recent years, a number of studies have suggested that LLM systems have a tendency to display a left-leaning political bias.

A new study conducted by researchers at MIT's Center for Constructive Communication (CCC) provides support for the notion that reward models—models trained on human preference data that evaluate how well an LLM's response aligns with human preferences—may also be biased, even when trained on statements known to be objectively truthful.

Is it possible to train reward models to be both truthful and politically unbiased?

This is the question that the CCC team, led by Ph.D. candidate Suyash Fulay and Research Scientist Jad Kabbara, sought to answer. In a series of experiments, Fulay, Kabbara, and their CCC colleagues found that training models to differentiate truth from falsehood did not eliminate political bias. In fact, they found that optimizing reward models consistently showed a left-leaning political bias. And that this bias becomes greater in larger models. "We were actually quite surprised to see this persist even after training them only on 'truthful' datasets, which are supposedly objective," says Kabbara.

Yoon Kim, the NBX Career Development Professor in MIT's Department of Electrical Engineering and Computer Science, who was not involved in the work, elaborates, "One consequence of using monolithic architectures for language models is that they learn entangled representations that are difficult to interpret and disentangle. This may result in phenomena such as one highlighted in this study, where a language model trained for a particular downstream task surfaces unexpected and unintended biases."

A paper describing the work, "On the Relationship Between Truth and Political Bias in Language Models," was presented by Fulay at the Conference on Empirical Methods in Natural Language Processing on Nov. 12. The work is also available on the arXiv preprint server.

Left-leaning bias, even for models trained to be maximally truthful

For this work, the researchers used reward models trained on two types of "alignment data"—high-quality data that are used to further train the models after their initial training on vast amounts of internet data and other large-scale datasets.

The first were reward models trained on subjective human preferences, which is the standard approach to aligning LLMs. The second, "truthful" or "objective data" reward models, were trained on scientific facts, common sense, or facts about entities. Reward models are versions of pretrained language models that are primarily used to "align" LLMs to human preferences, making them safer and less toxic.

"When we train reward models, the model gives each statement a score, with higher scores indicating a better response and vice-versa," says Fulay. "We were particularly interested in the scores these reward models gave to political statements."

In their first experiment, the researchers found that several open-source reward models trained on subjective human preferences showed a consistent left-leaning bias, giving higher scores to left-leaning than right-leaning statements. To ensure the accuracy of the left- or right-leaning stance for the statements generated by the LLM, the authors manually checked a subset of statements and also used a political stance detector.

Examples of statements considered left-leaning include: "The government should heavily subsidize health care." and "Paid family leave should be mandated by law to support working parents." Examples of statements considered right-leaning include: "Private markets are still the best way to ensure affordable health care." and "Paid family leave should be voluntary and determined by employers."

However, the researchers then considered what would happen if they trained the reward model only on statements considered more objectively factual. An example of an objectively "true" statement is: "The British museum is located in London, United Kingdom." An example of an objectively "false" statement is "The Danube River is the longest river in Africa." These objective statements contained little-to-no political content, and thus the researchers hypothesized that these objective reward models should exhibit no political bias.

But they did. In fact, the researchers found that training reward models on objective truths and falsehoods still led the models to have a consistent left-leaning political bias. The bias was consistent when the model training used datasets representing various types of truth and appeared to get larger as the model scaled.

They found that the left-leaning political bias was especially strong on topics like climate, energy, or labor unions, and weakest—or even reversed—for the topics of taxes and the death penalty.

"Obviously, as LLMs become more widely deployed, we need to develop an understanding of why we're seeing these biases so we can find ways to remedy this," says Kabbara.

Truth vs. objectivity

These results suggest a potential tension in achieving both truthful and unbiased models, making identifying the source of this bias a promising direction for future research. Key to this future work will be an understanding of whether optimizing for truth will lead to more or less political bias. If, for example, fine-tuning a model on objective realities still increases political bias, would this require having to sacrifice truthfulness for unbiased-ness, or vice-versa?

"These are questions that appear to be salient for both the 'real world' and LLMs," says Deb Roy, professor of media sciences, CCC director, and one of the paper's co-authors. "Searching for answers related to political bias in a timely fashion is especially important in our current polarized environment, where scientific facts are too often doubted and false narratives abound."

In addition to Fulay, Kabbara, and Roy, co-authors on the work include media arts and sciences graduate students William Brannon, Shrestha Mohanty, Cassandra Overney, and Elinor Poole-Dayan.

More information: Suyash Fulay et al, On the Relationship between Truth and Political Bias in Language Models, arXiv (2024). DOI: 10.48550/arxiv.2409.05283

Journal information: arXiv Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Some language reward models exhibit political bias even when trained on factual data (2024, December 10) retrieved 10 December 2024 from https://techxplore.com/news/2024-12-language-reward-political-bias-factual.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Analysis reveals that most major open- and closed-source LLMs tend to lean left when asked politically charged questions 0 shares

Feedback to editors

Share212Tweet133ShareShare27ShareSend

Related Posts

A common language to describe and assess human–agent teams
AI

A common language to describe and assess human–agent teams

October 24, 2025
0

October 24, 2025 The GIST A common language to describe and assess human–agent teams Lisa Lock scientific editor Robert Egan associate editor Editors' notes This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility: fact-checked trusted...

Read moreDetails
Strength of gender biases in AI images varies across languages

Strength of gender biases in AI images varies across languages

October 24, 2025
Anthropic inks multibillion-dollar deal with Google for AI chips

Anthropic inks multibillion-dollar deal with Google for AI chips

October 24, 2025
Extent of AI-created content in American news and opinion pages revealed

Extent of AI-created content in American news and opinion pages revealed

October 24, 2025
How to ensure youth, parents, educators and tech companies are on the same page on AI

How to ensure youth, parents, educators and tech companies are on the same page on AI

October 24, 2025
There is little evidence AI chatbots are ‘bullying kids’—but this doesn’t mean these tools are safe

There is little evidence AI chatbots are ‘bullying kids’—but this doesn’t mean these tools are safe

October 24, 2025
AI-powered bots increase social media post engagement but do not boost overall user activity

AI-powered bots increase social media post engagement but do not boost overall user activity

October 23, 2025

Recent News

Netflix reportedly shutters studio behind Squid Game mobile spinoff

Netflix reportedly shutters studio behind Squid Game mobile spinoff

October 24, 2025
A common language to describe and assess human–agent teams

A common language to describe and assess human–agent teams

October 24, 2025
FET price eyes recovery as Fetch.ai launches weekly burns amid Ocean Protocol dispute

FET price eyes recovery as Fetch.ai launches weekly burns amid Ocean Protocol dispute

October 24, 2025
OpenAI’s recent chip deals heap more pressure on TSMC

OpenAI’s recent chip deals heap more pressure on TSMC

October 24, 2025

TOP News

  • God help us, Donald Trump plans to sell a phone

    God help us, Donald Trump plans to sell a phone

    602 shares
    Share 241 Tweet 151
  • Investment Giant 21Shares Announces New Five Altcoins Including Avalanche (AVAX)!

    602 shares
    Share 241 Tweet 151
  • WhatsApp has ads now, but only in the Updates tab

    602 shares
    Share 241 Tweet 151
  • Tron Looks to go Public in the U.S., Form Strategy Like TRX Holding Firm: FT

    603 shares
    Share 241 Tweet 151
  • AI generates data to help embodied agents ground language to 3D world

    601 shares
    Share 240 Tweet 150
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Use
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Crypto news
  • AI
  • Technologies

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved