Reddit tightens security against AI bots scraping platform content
Reddit, the widely-used social media platform, is reinforcing its Robots Exclusion Protocol (robots.txt file) to shield its content from automated web bots. The company will also persist in rate-limiting and blocking unidentified bots and crawlers. This move is primarily aimed at preventing AI companies from using Reddit's content for training their models without permission.
Reddit's updated protocol targets unauthorized AI crawlers
The updated protocol will not affect most users or good faith actors such as researchers and organizations like the Internet Archive. However, it could potentially deter AI companies from using Reddit's content without permission. Despite this, there is a chance that AI crawlers might disregard Reddit's updated robots.txt file. The company has stated that any bots or crawlers not adhering to its Public Content Policy will face restrictions.
Reddit's new measures follow recent controversy
The announcement comes in the wake of a recent investigation by Wired, which exposed that AI-powered search startup Perplexity had been scraping and using content without permission. Despite being blocked in its robots.txt file, Perplexity continued to ignore requests not to scrape its website. In response, Perplexity CEO Aravind Srinivas stated that the robots.txt file does not constitute a legal framework.
Reddit's new policy exempts partners with agreements
Reddit's new changes will not impact companies with which it has an agreement. For instance, Google, which has a $60 million deal with Reddit, is allowed to train its AI models using content from the social platform. This indicates that other companies wishing to use Reddit's data for AI training will need to negotiate access terms.