Wikipedia weathers AI challenges but faces new pressures from data scrapers: Study

October 1, 2025

The GIST Wikipedia weathers AI challenges but faces new pressures from data scrapers: Study

Lisa Lock

scientific editor

Andrew Zinin

lead editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

proofread

Wikipedia
Wikipedia logo.

ChatGPT has not decreased activity on the world's largest online encyclopedia, but AI data scrapers and the influence of large language models still cast a shadow over its future, research suggests.

Work by King's College London examined changes to the aggregate views of Wikipedia in 12 languages, with six of those languages being open to ChatGPT and the others not. The researchers found no sign of reduced usage since the AI model was introduced in 2022.

However, they did note a slowed growth in usage in languages where ChatGPT was active compared to those where it was not, suggesting the program has had a limited impact.

In 2021, a long-time Wikipedia editor infamously raised the idea of the "death" of the platform due to the influence of AI. In this scenario, chatbots like GPT would supplant Wikipedia as the primary source of online information, replacing human editors with AI-generated overviews and polluting the information sphere through well-documented hallucinations.

Some in the industry fear this has come to pass with worldwide web traffic to referral sites, of which Wikipedia is the largest, falling by 15% between June 2024 and June 2025.

The paper, published in ACM Collective Intelligence, refutes this form of "death." However, the researchers suggest the increased cost of running servers due to the influx of AI data scrapers using Wikipedia to train AI models is increasing rapidly, which the website's moderators say could still threaten the current structure of the platform.

Professor Elena Simperl, professor of computer science at King's and co-director of the King's Institute for Artificial Intelligence, said, "Our work did not confirm the most alarmist scenario, but we're not out of the woods yet. AI developers are letting their scrapers loose on Wikipedia to train them on high-quality data, pushing up traffic to levels where Wikipedia's servers are struggling to keep up. Generative AI summaries are also using Wikipedia's data in web searches but not crediting sources, siphoning web traffic away while borrowing the platform's work.

"For free services like this, no one stops to ask how it's being paid for—and now Wikipedia is having to make the tough decision of where to allocate their limited resources to deal with this. It's vital as a community we take steps to protect this important platform, and we hope to turn our work into a monitoring tool where the community can track how AI is impacting Wikipedia."

"Ultimately, we need a new social contract between AI companies and providers of high-quality data like Wikipedia where they retain more power over their material, while still allowing for their data to be used as training material," says postdoc and first author Neal Reeves.

Wikipedia is the largest online crowdsourced encyclopedia, consisting of more than 6.6 million articles in 292 languages as of 2023, and is a major source of free information for search engines and numerous online communities. This is particularly the case for languages outside of Europe and East Asia, who depend heavily on Wikipedia for access to freely available information.

Reeves suggests there are steps available to protect Wikipedia. "Ultimately, we need a new social contract between AI companies and providers of high-quality data like Wikipedia where they retain more power over their material, while still allowing for their data to be used for training purposes.

"Collaboration, like that seen in programs like MLCommons, is needed to reach across the aisle and ensure that the next generation of AI models are trained well, but in a way that doesn't destroy one of the free internet's greatest resources."

In the future, the team hope to use the feedback they've received from the Wikipedia community to develop an openly available monitoring tool that users from across the world can deploy to run analyses on the state of Wikipedia easier with more rigorous analytical methods.

More information: Neal Reeves et al, Exploring the impact of ChatGPT on Wikipedia engagement, Collective Intelligence (2025). DOI: 10.1177/26339137251372599

Provided by King's College London Citation: Wikipedia weathers AI challenges but faces new pressures from data scrapers: Study (2025, October 1) retrieved 2 October 2025 from https://techxplore.com/news/2025-10-wikipedia-overblown.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Wikipedia 'doing very well financially': co-founder

Feedback to editors