“Empowering Redditors: Vana’s Proposal to Lease User Data for AI Training”

Vana plans to let users rent out their Reddit data to train AI A startup, Vana, says it wants users to get paid for training dataIn the generative AI boom, data is the new oil. “It does this by allowing users to aggregate their personal data in a non-custodial way … Vana allows users to own AI models and use their data across AI applications.”Here’s how Vana pitches its platform and API to developers:The Vana API connects a user’s cross-platform personal data … to allow you to personalize your application. This month, Vana launched what it’s calling the Reddit Data DAO (Digital Autonomous Organization), a program that pools multiple users’ Reddit data (including their karma and post history) and lets them to decide together how that combined data is used. We have crunched the numbers and r/datadao is now largest data DAO in history: Phase 1 welcomed 141,000 reddit users with 21,000 full data uploads. “Reddit does not share non-public, personal data with commercial enterprises, and when Redditors request an export of their data from us, they receive non-public personal data back from us in accordance with applicable laws.

Vana Presents a Revolutionary Way to Monetize your Reddit Data through AI

In the fast-growing field of generative AI, data is the driving force. It is often referred to as the new oil, and for good reason. The demand for data to train AI models continues to rise, with major tech firms and startups alike seeking out partnerships with data brokers. All this in the pursuit of creating more advanced and legally defensible AI-powered products. For instance, Shutterstock has deals with giants like Meta, Google, Amazon and Apple to supply millions of images for model training. Similarly, OpenAI has signed agreements with esteemed news organizations to train their models on extensive news archives.

However, in most cases, it is the individual creators and owners of the data who do not reap the benefits of the lucrative data deals struck between companies. But a startup called Vana is looking to change that narrative.

Vana was co-founded in 2021 by Anna Kazlauskas and Art Abal, who met in a class at the MIT Media Lab focused on building technology for emerging markets. Kazlauskas, with a background in computer science and economics from MIT, left to launch a fintech automation startup, Iambiq, which was incubated by Y Combinator. Abal, an educated and trained corporate lawyer, was an associate at The Cadmus Group before becoming the head of impact sourcing at data annotation company Appen.

Vana’s platform aims to allow users to pool their data and make it available for generative AI model training. The platform also seeks to create personalized experiences based on this data, such as daily motivational voicemail messages tailored to wellness goals or an art-generating app that understands individual style preferences. According to Kazlauskas, Vana’s infrastructure allows users to create a user-owned data treasury by aggregating personal data in a non-custodial manner.

Vana’s API connects personal data from various platforms, empowering developers to create personalized applications. By doing so, applications have immediate access to a user’s personalized AI model or underlying data, simplifying onboarding and eliminating compute cost concerns. The platform also allows users to bring their data from “walled gardens” like Instagram, Facebook, and Google to these applications, resulting in a unique personalized experience from the user’s very first interaction with the consumer AI application.

Vana believes that users should have control over their personal data and be able to utilize it for AI applications across various platforms.

Registration with Vana is an easy process. Users can attach their data to a digital avatar, such as selfies, a description, and voice recordings. Additionally, they can explore applications built using Vana’s platform and data sets. The available apps range from ChatGPT-style chatbots and interactive storybooks to a dating profile generator.

However, one may wonder why anyone would voluntarily hand over their personal data to an anonymous startup, let alone a venture-backed one, in this age of heightened data privacy awareness and ransomware attacks. (Vana has raised a total of $20 million from prominent investors like Paradigm, Polychain Capital, and others.) Can any profit-driven company truly be trusted to handle sensitive data ethically and responsibly?

In response to these concerns, Kazlauskas emphasizes that Vana’s main goal is to give users control over their data. They have the option to self-host their data instead of storing it on Vana’s servers and can control how their data is shared with apps and developers. She also assures that Vana has no incentive to exploit users and their data, as the company generates revenue through monthly subscriptions (starting at $3.99) and a data transaction fee for developers who utilize the platform.

She further explains, “We aim to create user-owned and governed models where individuals contribute their data and can bring their data and models to any application.”

While Vana is not in the business of selling user data for AI model training, they do offer the option for users to do this themselves, starting with their Reddit posts.

Recently, Vana launched a program called the Reddit Data DAO (Digital Autonomous Organization). This program allows users to pool their Reddit data, including karma and post history, and make decisions together on how the combined data is used. Users who join the DAO gain voting rights on decisions such as licensing the combined data to generative AI companies for shared profits.

According to a tweet from the official r/datadao account, the DAO already boasts over 141,000 members and 21,000 full data uploads, making it the largest data DAO to date.

The DAO was created as a response to Reddit’s recent efforts to commercialize data on its platform. Previously, Reddit did not restrict access to posts and communities for generative AI training. However, they reversed their policy late last year, just before their IPO. Since then, Reddit has earned over $203 million in licensing fees from partnerships with companies like Google.

The broad idea of the DAO is to liberate user data from major platforms looking to hoard and monetize it.

However, Reddit, which is not officially working with Vana, is not happy about the DAO. They banned Vana’s subreddit dedicated to discussing the DAO, stating that the company was “exploiting” their data export system, which is designed to abide by data privacy regulations like the GDPR and California Consumer Privacy Act.

A Reddit spokesperson told TechCrunch, “We do not share non-public, personal data with commercial enterprises. Our data arrangements allow us to put guardrails on such entities, even on public information. Direct partnerships between Reddit and vetted organizations, with clear terms and accountability, matters. These partnerships and agreements prevent misuse and abuse of people’s data.”

But is Reddit’s concern justified?

Kazlauskas predicts that as the DAO continues to grow, it could potentially impact the amount Reddit can charge for its data. However, with just over 141,000 members, the DAO is still a minority compared to Reddit’s 73-million-strong user base, and some of its members could be bots or duplicate accounts.

The DAO also faces challenges in properly distributing payments received from data buyers. Currently, the DAO awards “tokens” or cryptocurrency to users based on their Reddit karma. However, this may not be the most accurate measure of quality contributions to the data set, especially in smaller Reddit communities with fewer opportunities to earn karma.

To potentially increase the value of the DAO, Kazlauskas suggests that members could choose to share their cross-platform and demographic data. However, this would require even more trust in Vana to responsibly handle sensitive information.

In my opinion, it is unlikely for Vana’s DAO to reach critical mass. But it is not the first attempt to give users control over the data used for generative AI model training. Startups like Spawning and vendors like Getty Images, Shutterstock, and Adobe are also working on solutions. It seems that no one has yet cracked the code, but perhaps someone will find a way, or policymakers will intervene.

Avatar photo
Dylan Williams

Dylan Williams is a multimedia storyteller with a background in video production and graphic design. He has a knack for finding and sharing unique and visually striking stories from around the world.

Articles: 834

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *