Reddit rakes in over $200M through data licensing agreements

In its IPO prospectus filed today with the U.S. Securities and Exchange Commission, Reddit repeatedly emphasized how much it thinks it stands to gain — and has gained — from data licensing agreements with the companies training AI models on its over one billion posts and over 16 billion comments. “In January 2024, we entered into certain data licensing arrangements with an aggregate contract value of $203.0 million and terms ranging from two to three years,” the prospectus reads. “We expect a minimum of $66.4 million of revenue to be recognized during the year ending December 31, 2024 and the remaining thereafter.”Now, it’s a mystery as to which AI vendors are licensing data from Reddit so far. Why’s Reddit data valuable? Reddit previously didn’t gate access to its data for AI training purposes.

As Reddit prepares to go public, its future success could depend on its relationships with Artificial Intelligence (AI) vendors like OpenAI. In its IPO prospectus filed with the US Securities and Exchange Commission, Reddit highlights the importance of data licensing agreements with companies using its vast collection of over one billion posts and over 16 billion comments to train their AI models.

According to the prospectus, Reddit entered into data licensing agreements worth $203.0 million in January 2024, with expected revenue of $66.4 million in the year 2024. While the specific AI vendors were not disclosed, it has been reported that a “large unnamed AI company” has entered into an agreement worth approximately $60 million annually. However, with OpenAI CEO Sam Altman owning 8.7% of Reddit and previously serving on its board of directors, it is not surprising to consider OpenAI as one of Reddit’s data customers.

So why exactly is Reddit’s data so valuable to AI vendors? As Reddit explains, AI models require a large amount of data to “learn” from examples and create outputs such as essays, code, emails, and articles. Vendors like OpenAI scour the web for millions to billions of examples, and Reddit’s vast collection of conversational data can significantly enhance their training sets. However, as the prospectus notes, accessing this data is not simple or free. Reddit’s content is protected under restrictive licenses that may require citation or compensation.

In the past, Reddit did not restrict access to its data for AI training purposes. However, the company reversed course last year, arguing that its valuable data should not be given away for free to the world’s largest companies. As the prospectus states, Reddit’s data is continuously updated and reflects the latest trends, making it invaluable for training and improving large language models.

This trend of data licensing agreements with AI vendors is not unique to Reddit. Content producers, from stock media libraries to news publishers, are increasingly turning to these agreements as the rise of chatbots and AI-powered search engines threaten to reduce their traffic. In fact, a recent study found that AI-powered search engines could potentially answer a user’s query 75% of the time without the need to visit a publisher’s website.

As a result, vendors face mounting legal challenges for using data without permission or compensation. The New York Times, for example, has accused OpenAI of creating news publisher competitors using its content, ultimately hurting its business. OpenAI has entered into similar agreements with image gallery Shutterstock and publishers such as Axel Springer, which owns Politico and Business Insider. However, these agreements are typically much smaller in comparison, with a reported maximum of $5 million per year.

Avatar photo
Max Chen

Max Chen is an AI expert and journalist with a focus on the ethical and societal implications of emerging technologies. He has a background in computer science and is known for his clear and concise writing on complex technical topics. He has also written extensively on the potential risks and benefits of AI, and is a frequent speaker on the subject at industry conferences and events.

Articles: 865

Leave a Reply

Your email address will not be published. Required fields are marked *