Cloudflare has said it has de-listed Perplexity’s verified web crawler bot after observing ‘stealth’ crawling behaviour from the AI search startup.
The cloud infrastructure provider, that also provides web security services, accused Perplexity of attempting to circumvent a website’s preferences by using undeclared bots. Perplexity’s declared bots to crawl the web are supposedly ‘PerplexityBot’ and ‘Perplexity-User’.
“We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs [autonomous system number] to hide their crawling activity,” Cloudflare said in a blog post published this week. It also alleged that Perplexity’s bots continue to ignore robots.txt files, a decades-old web standard meant to prevent crawling and scraping of content on a website without permission from the owner.
These allegations have emerged amid a surge of bots flooding the internet. Since the rise of generative AI, there has also been a sharp increase in bots seeking to scrape massive amounts of data for training AI models.
A report by web and AI-focused researcher Henk van Ess in June this year, revealed that AI chatbots like ChatGPT and Perplexity were able to accurately summarise paywalled articles from outlets such as The Atlantic, The New York Times, and Financial Times in up to half of the tested cases. Smaller websites are especially at risk as the surge in bot activity can also strain their servers.
“Some supposedly ‘reputable’ AI companies act more like North Korean hackers. Time to name, shame, and hard block them,” Cloudflare CEO Matthew Prince said in an X post with a link to the company’s report on Perplexity.
Some supposedly “reputable” AI companies act more like North Korean hackers. Time to name, shame, and hard block them. https://t.co/vqMzGRHZPf
— Matthew Prince 🌥 (@eastdakota) August 4, 2025
https://platform.twitter.com/widgets.js
However, AI companies like Perplexity have pointed out that these traditional crawler and scraper bots differ from AI assistants and AI agents which also operate on the open web.
Story continues below this ad
Cloudflare’s experiment
Cloudflare said that it was first alerted to Perplexity’s alleged stealth crawling activity by customers, who said that the AI startup was able to access their content despite having robots.txt files and creating further WAF (Web Application Firewall) rules against its declared crawlers.
As part of its experiment, Cloudflare purchased new website domains like testexample.com and secretexample.com that had not yet been indexed by any search engine yet. It created a robots.txt file with directives to stop any respectful bots from accessing any part of a website
However, Perplexity’s chatbot was allegedly able to provide detailed information about the content hosted on these restricted domains when queried by Cloudflare. “This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their crawlers,” the company said.
Perplexity’s undeclared bot
In cases where Perplexity’s declared crawlers were blocked, the startup allegedly used an undeclared bot via “a generic browser” impersonating “Google Chrome on macOS,” as per Cloudflare.
Story continues below this ad
“This undeclared crawler utilized multiple IPs not listed in Perplexity’s official IP range, and would rotate through these IPs in response to the restrictive robots.txt policy and block from Cloudflare.In addition to rotating IPs, we observed requests coming from different ASNs in attempts to further evade website blocks,” it said.
After blocking Perplexity’s undeclared bot, Cloudflare said that its AI-generated answers were less specific and lacked details from the original content.
How has Perplexity responded?
On Cloudflare’s allegations regarding its undeclared bot, Perplexity said that the crawling behaviour was from a third-party service it uses occasionally.
“It appears Cloudflare confused Perplexity with 3-6M daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialized tasks (less than 45,000 daily requests),” the company said in a blog post published a day after Cloudflare’s report.
Story continues below this ad
The Jeff Bezos-backed startup also argued that its ‘user-driven AI assistants’ had been mischaracterised by Cloudflare as malicious bots.
“The difference between automated crawling and user-driven fetching isn’t just technical — it’s about who gets to access information on the open web […] When Perplexity fetches a webpage, it’s because you asked a specific question requiring current information. The content isn’t stored for training—it’s used immediately to answer your question,” Perplexity said.
“This controversy reveals that Cloudflare’s systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats. If you can’t tell a helpful digital assistant from a malicious scraper, then you probably shouldn’t be making decisions about what constitutes legitimate web traffic,” it added.