Summarize

Reddit tightens security against AI bots scraping platform content

By Mudit Dube

Jun 26, 2024

11:56 am

What's the story

Reddit, the widely-used social media platform, is reinforcing its Robots Exclusion Protocol (robots.txt file) to shield its content from automated web bots.

The company will also persist in rate-limiting and blocking unidentified bots and crawlers.

This move is primarily aimed at preventing AI companies from using Reddit's content for training their models without permission.

Policy enforcement

Reddit's updated protocol targets unauthorized AI crawlers

The updated protocol will not affect most users or good faith actors such as researchers and organizations like the Internet Archive.

However, it could potentially deter AI companies from using Reddit's content without permission.

Despite this, there is a chance that AI crawlers might disregard Reddit's updated robots.txt file.

The company has stated that any bots or crawlers not adhering to its Public Content Policy will face restrictions.

AI misuse

Reddit's new measures follow recent controversy

The announcement comes in the wake of a recent investigation by Wired, which exposed that AI-powered search startup Perplexity had been scraping and using content without permission.

Despite being blocked in its robots.txt file, Perplexity continued to ignore requests not to scrape its website.

In response, Perplexity CEO Aravind Srinivas stated that the robots.txt file does not constitute a legal framework.

Partner exemption

Reddit's new policy exempts partners with agreements

Reddit's new changes will not impact companies with which it has an agreement.

For instance, Google, which has a $60 million deal with Reddit, is allowed to train its AI models using content from the social platform.

This indicates that other companies wishing to use Reddit's data for AI training will need to negotiate access terms.