Cloudflare has begun blocking AI crawlers from scraping content from websites by default, introducing a new model that allows publishers to charge bots for access.

The internet infrastructure giant announced on Tuesday that it will no longer allow artificial intelligence (AI) companies to freely access content hosted on its platform, unless explicitly permitted by website owners.

The new policy takes effect immediately for all new websites signing up on Cloudflare.

Blocking bots by default

This latest move builds on a feature Cloudflare introduced in September 2023, which allowed publishers to block AI crawlers with a single click.

Going forward, blocking these bots will be the default setting for all websites under Cloudflare’s network.

Cloudflare CEO Matthew Prince emphasized that the company’s decision to block AI crawlers by default is aimed at rebalancing control in favor of content creators, while still supporting innovation in AI development.

“AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators while still helping AI companies innovate.

“This is about safeguarding the future of a free and vibrant Internet with a new model that works for everyone,” the CEO stated.

As part of the new framework, Cloudflare is introducing a “pay-per-crawl” option that allows publishers to charge AI companies that want to access their content.

Cloudflare’s content delivery network (CDN) handles a significant share of global internet traffic, making this decision impactful for AI companies that rely heavily on public data for model training.

According to Cloudflare’s 2023 report, approximately 16% of all global internet traffic is routed directly through its Content Delivery Network (CDN). This figure highlights the scale and centrality of Cloudflare’s infrastructure in the global internet ecosystem.

What are AI crawlers?

AI crawlers are automated bots used to extract massive amounts of text, images, and data from the web to train large language models developed by companies like OpenAI and Google.

Unlike traditional search engine bots, these crawlers don’t direct users back to the source content. Instead, they use the extracted material to generate responses or predictions, often without crediting or compensating the original creators.

Cloudflare criticized this approach, saying it undermines content creators and deprives publishers of traffic and advertising revenue.

This means that AI firms will now be required to compensate content owners for access to their data, rather than freely harvesting it without permission or payment.

Reactions

OpenAI, the Microsoft-backed AI firm, reportedly declined to join Cloudflare’s initiative. The company argued that Cloudflare was adding a middleman to the system.

OpenAI also emphasized its early adoption of robots.txt—a tool that allows publishers to block scrapers and claimed it respects such signals from websites.

A Partner at U.K.-based law firm Cripps, Matthew Holman, in an interview with CNBC, explained the potential impact of Cloudflare’s new policy on AI model development

“AI crawlers are typically seen as more invasive and selective when it comes to the data they consume. They have been accused of overwhelming websites and significantly impacting user experience,” said Holman

He stated that if Cloudflare’s new blocking system proves effective, it could limit AI chatbots’ ability to collect data needed for training and search functionalities. This may result in short-term disruptions to AI model development and, over time, could pose challenges to the sustainability and performance of those models.