Cloudflare co-founder and CEO Matthew Prince has disclosed what caused its major service disruption that shut down parts of the internet for hours on Tuesday, describing it as the worst outage since 2019.

Several Nigerian websites on Cloudflare’s network experienced downtime or slow loading as part of a global outage, disrupting access to news media outlets and e-commerce platforms, and others across the country.

“Today was Cloudflare’s worst outage since 2019. We’ve had outages that have made our dashboard unavailable. Some have caused newer features to not be available for a period of time. But in the last 6+ years, we’ve not had another outage that has caused the majority of core traffic to stop flowing through our network,” the company stated.

What triggered the disruption

Cloudflare, which as of last year carries traffic for about 20 percent of the global web, said the outage stemmed from an internal change to the permissions system of a database linked to Bot Management. The company stressed that the problem had nothing to do with generative AI tools, DNS issues or an attack.

“The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind. Instead, it was triggered by a change to one of our database systems’ permissions, which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.

“The software running on these machines to route traffic across our network reads this feature file to keep our Bot Management system up to date with ever-changing threats. The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail.

“After we initially wrongly suspected the symptoms we were seeing were caused by a hyper-scale DDoS attack, we correctly identified the core issue and were able to stop the propagation of the larger-than-expected feature file and replace it with an earlier version of the file. Core traffic was largely flowing as normal by 14:30. We worked over the next few hours to mitigate increased load on various parts of our network as traffic rushed back online. As of 17:06 all systems at Cloudflare were functioning as normal,” he stated.

Widespread impact across major platforms

The crash disconnected several global services, including X, ChatGPT and the outage-tracking platform Downdetector.

Because the breakdown affected requests tied to bot scoring, companies that relied on Cloudflare’s bot rules ended up blocking legitimate traffic. Businesses that did not use bot-based rules stayed online.

The failure resembled recent major outages involving Microsoft Azure and Amazon Web Services, underscoring concerns about the increasing dependence on a small number of internet infrastructure providers.

Cloudflare’s plan to prevent a recurrence

In the blog post, Prince outlined four immediate measures Cloudflare is taking to avoid a similar outage in the future:

Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input

Enabling more global kill switches for features

Eliminating the ability for core dumps or other error reports to overwhelm system resources

Reviewing failure modes for error conditions across all core proxy modules

The company acknowledged that as the internet becomes more centralized around a few key infrastructure providers, incidents of this scale may be harder to avoid.