train

“Human Native AI: Revolutionizing the Marketplace for AI Training Licensing Deals”

1
Human Native AI is a London-based startup building a marketplace to broker such deals between the many companies building LLM projects and those willing to license data to them. Human Native AI also helps rights holders prepare and price their content and monitors for any copyright infringements. Human Native AI takes a cut of each deal and charges AI companies for its transaction and monitoring services. Human Native AI announced a £2.8 million seed round led by LocalGlobe and Mercuri, two British micro VCs, this week. It is also a smart time for Human Native AI to launch.

Adobe Also Joining the Race: Developing Generative Video Technology

Adobe's $20 Billion Figma Deal Faces Eu Probe
Offered as an answer of sorts to OpenAI’s Sora, Google’s Imagen 2 and models from the growing number of startups in the nascent generative AI video space, Adobe’s model — a part of the company’s expanding Firefly family of generative AI products — will make its way into Premiere Pro, Adobe’s flagship video editing suite, sometime later this year, Adobe says. Like many generative AI video tools today, Adobe’s model creates footage from scratch (either a prompt or reference images) — and it powers three new features in Premiere Pro: object addition, object removal and generative extend. The lack of release time frame on the video model doesn’t instill a lot of confidence that it’ll avoid the same fate. And that, I’d say, captures the overall tone of Adobe’s generative video presser. Adobe’s clearly trying to signal with these announcements that it’s thinking about generative video, if only in the preliminary sense.

“Empowering Redditors: Vana’s Proposal to Lease User Data for AI Training”

Reddit Ipo V2
Vana plans to let users rent out their Reddit data to train AI A startup, Vana, says it wants users to get paid for training dataIn the generative AI boom, data is the new oil. “It does this by allowing users to aggregate their personal data in a non-custodial way … Vana allows users to own AI models and use their data across AI applications.”Here’s how Vana pitches its platform and API to developers:The Vana API connects a user’s cross-platform personal data … to allow you to personalize your application. This month, Vana launched what it’s calling the Reddit Data DAO (Digital Autonomous Organization), a program that pools multiple users’ Reddit data (including their karma and post history) and lets them to decide together how that combined data is used. We have crunched the numbers and r/datadao is now largest data DAO in history: Phase 1 welcomed 141,000 reddit users with 21,000 full data uploads. “Reddit does not share non-public, personal data with commercial enterprises, and when Redditors request an export of their data from us, they receive non-public personal data back from us in accordance with applicable laws.

BlaBlaCar Secures $108 Million in Debt Line After Profitability Milestone

Nicolas Brusson 3
The carpooling and bus ticketing company has been around for so long that it’s hard to consider it a startup anymore. Today, the company is announcing that it’s secured a €100 million revolving credit facility ($108M at today’s exchange rate). And the good news is that there are BlaBlaCar users all around the world — not just France. When the war in Ukraine started, BlaBlaCar had millions of users in Russia. Even if you don’t book your next train ride on BlaBlaCar, the company is also experimenting with last-mile carpooling.

“Training Corporate Employees on Data and AI: Modal Secures $25M Investment”

Gettyimages 1492719997
A few years ago, Darren Shimkus, ex-president of Udemy, had a conversation with Dennis Yang about skills building. Modal provides personalized technical skills training for a company’s staff, offering on-demand coaching and a pedagogical approach that groups users into semi-structured online learning communities. First, Shimkus says, by honing in on hot trends: data and AI. “The rise of AI is bringing more visibility to data teams than ever before,” Shimkus said. “It’s hard in today’s ever-changing workplace landscape to predict what your teams need, meaning most leaders don’t have a reliable way to plan for and improve their team’s skills.

Pienso creates AI model training tools without coding requirements

Gettyimages 1335295270
“So much of the AI conversation has been dominated by … large language models,” Jones said, “but the reality is that no one model can do everything. Pienso believes that any domain expert, not just an AI engineer, should be able to do just that.”Pienso guides users through the process of annotating or labeling training data for pre-tuned open source or custom AI models. “Pienso’s flexible, no-code interface allows teams to train models directly using their own company’s data,” Jones said. “This alleviates the privacy concerns of using … models, and also is more accurate, capturing the nuances of each individual company.”Companies pay Pienso a yearly license based on the number of AI models they deploy. It’s fostering a future where we’re building smarter AI models for a specific application, by the people who are most familiar with the problems they are trying to solve.”

“Compensating Creators: Exploring the Payment of Training Data by OpenAI’s Vice President”

Screenshot 2024 03 11 At 5.04.17a ¯pm Transformed
Should artists whose work was used to train generative AI like ChatGPT be compensated for their contributions? OpenAI is in a delicate legal position where it concerns the ways in which it uses data to train generative AI systems like the art-creating tool DALL-E 3, which is incorporated into ChatGPT. “Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents,” writes the company in a January blog post. OpenAI has licensing agreements in place with some content providers, like Shutterstock, and allows webmasters to block its web crawler from scraping their site for training data. In addition, like some of its rivals, OpenAI lets artists “opt out” of and remove their work from the data sets that the company uses to train its image-generating models.

“Automating AI Training Data Curation: DatologyAI’s Revolutionary Technology”

Gettyimages 1148091243
Massive training data sets are the gateway to powerful AI models — but often, also those models’ downfall. Morcos’ company, DatologyAI, builds tooling to automatically curate data sets like those used to train OpenAI’s ChatGPT, Google’s Gemini and other like GenAI models. “However, not all data are created equal, and some training data are vastly more useful than others. History has shown automated data curation doesn’t always work as intended, however sophisticated the method — or diverse the data. The largest vendors today, from AWS to Google to OpenAI, rely on teams of human experts and (sometimes underpaid) annotators to shape and refine their training data sets.

“Maximizing Start-Up Success: 5 Key Strategies for Effective LLM Deployment”

Gettyimages 1343238867
In fact, an April 2023 Arize survey found that 53% of respondents planned to deploy LLMs within the next year or sooner. The H100 GPU from Nvidia, a popular choice for LLMs, has been selling on the secondary market for about $40,000 per chip. One source estimated it would take roughly 6,000 chips to train an LLM comparable to ChatGPT-3.5. That source estimated that the power consumption to run ChatGPT-3.5 is about 1 GWh a day, or the combined daily energy usage of 33,000 households. Power consumption can also be a potential pitfall for user experience when running LLMs on portable devices.