July 10, 2025
The GIST Tool devised for detecting AI that scores high on accuracy, low on false accusations
Gaby Clark
scientific editor
Andrew Zinin
lead editor
Editors' notes
This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
fact-checked
preprint
trusted source
proofread

Detecting writing via artificial intelligence is a tricky dance: Doing it right means being effective at identifying it while being careful not to falsely accuse a human of employing it. And few tools strike the right balance.
A team of researchers at the University of Michigan say they have devised a new way to tell whether a piece of text written by AI passes both tests—something that could be especially useful in academia and public policy as AI content proliferates and becomes more indistinguishable from human-generated content.
The team calls its tool "Liketropy," which is inspired by the theoretical backbone of its method: It blends likelihood and entropy, two statistical ideas that power its test.
They designed "zero-shot statistical tests," which can determine whether a piece of writing was written by a human or a Large Language Model without requiring prior training on examples of each.
The current tool focuses on LLMs, a specific type of AI for producing text. It uses statistical properties of the text itself, such as how surprising or predictable the words are, to decide if it looks more human or machine-generated.
In testing on large-scale datasets—even those whose models were hidden from the public or where AI-generated text was designed to surpass detectors—researchers say their tool performed well. When the test is designed with specific LLMs in mind as potential generators of the text, it achieves an average accuracy above 96% and a false accusation rate as low as 1%.
"We were very intentional about not creating a detector that just points fingers. AI detectors can be overconfident, and that's risky—especially in education and policy," said Tara Radvand, a doctoral student at U-M's Ross School of Business who co-authored the study. "Our goal was to be cautious about false accusations while still flagging AI-generated content with statistical confidence."
Among the researchers' unexpected findings were how little they needed to know about a language model to be capable of catching it. The test worked and still performed well, challenging the assumption that detection must rely on access, training or cooperation, Radvand said.
The team was motivated by fairness, particularly for international students and non-native English speakers. Emerging literature shows that students who speak English as a second language may be unfairly flagged for "AI-like" writing because of tone or sentence structure.
"Our tool can help these students self-check their writing in a low-stakes, transparent way before submission," Radvand said.
As for next steps, she and her colleagues plan to expand their demo into a tool that can be adapted into different domains. They've learned that fields such as law and science, as well as applications like college admissions, have different thresholds in the "cautious-effective" trade-off.
A critical application for AI detectors is to reduce the spread of misinformation on social media. Some tools intentionally train LLMs to adopt extreme beliefs and spread misinformation on social media to manipulate public opinion.
Because these systems can generate large-scale false content, the researchers say it's crucial to develop reliable detection tools that can flag such content and comments. Early identification helps platforms limit the reach of harmful narratives and protect the integrity of public discourse.
They also plan to speak with U-M business and university leaders about the prospect of adopting their tool as a complement to U-M GPT and the Maizey AI assistant to verify whether text was generated by these tools versus an external AI model, such as ChatGPT.
Liketropy received a Best Presentation Award at the Michigan Student Symposium for Interdisciplinary Statistical Sciences, an annual event organized by graduate students. It was also featured by Paris Women in Machine Learning and Data Science, a France-based community of women interested in machine learning and data science that hosts various events.
The research is published on the arXiv preprint server.
More information: Tara Radvand et al, Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities, arXiv (2025). DOI: 10.48550/arxiv.2501.02406
HuggingFace: huggingface.co/spaces/tararad/ … ketropy-LLM-Detector
Journal information: arXiv Provided by University of Michigan Citation: Tool devised for detecting AI that scores high on accuracy, low on false accusations (2025, July 10) retrieved 10 July 2025 from https://techxplore.com/news/2025-07-tool-ai-scores-high-accuracy.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
Explore further
New study reveals bias in AI text detection tools impacts academic publishing fairness shares
Feedback to editors