Cloudflare's tool helps websites block unauthorized AI scraping
Cloudflare, a leading cloud service provider, has launched a free tool designed to prevent AI bots from scraping content from its clients' websites. The tool will be available to all customers, including those on free plans. The company stated that the feature will "automatically be updated over time as we see new fingerprints of offending bots we identify as widely scraping the web for model training."
Clients respond to AI bot surge
Cloudflare has revealed that 85.2% of its customers have chosen to block AI bots, even those that properly identify themselves, from accessing their websites. This information was shared in a blog post, discussing the increase in bots scraping content for generative AI model training. The most active bots identified were Bytedance-owned Bytespider and OpenAI's GPTBot, attempting access on 40% and 35% of websites under Cloudflare's protection, respectively. Other AI bot crawlers include Amazon's Amazonbot and AI firm Anthropic's ClaudeBot.
Ongoing battle against AI bots
Cloudflare acknowledges the challenges in consistently blocking AI bots from accessing content. The company stated, "We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection." Despite these difficulties, Cloudflare remains committed to combating this behavior and plans to continually update its bot blocks and evolve its machine learning (ML) models. The company has also established a reporting system for hosts to report suspected AI bots and crawlers.
The growing problem of website scraping
Cloudflare has recognized the increasing issue of AI bots as demand for model training data rises due to the generative AI boom. Approximately 26% of the top 1,000 websites on the web have chosen to block OpenAI's bot. However, blocking is not a foolproof solution as some vendors have been found to ignore standard bot exclusion rules for competitive advantage. The effectiveness of tools like Cloudflare's will depend on their accuracy in detecting clandestine AI bots.