“Uncovering the Unsung Heroes of AI: Honoring the Data Annotators”

This week in AI, I’d like to turn the spotlight on labeling and annotation startups — startups like Scale AI, which is reportedly in talks to raise new funds at a $13 billion valuation. Labeling and annotation platforms might not get the attention flashy new generative AI models like OpenAI’s Sora do. Without them, modern AI models arguably wouldn’t exist. Labels, or tags, help the models understand and interpret data during the training process. Some of the tasks on Scale AI take labelers multiple eight-hour workdays — no breaks — and pay as little as $10.

The Fast-Moving World of AI: Keeping Up with Current Developments

The industry of Artificial Intelligence (AI) is constantly evolving, making it a tall order to keep up with all the latest advancements. Until the day when an AI can do it for us, here’s a handy roundup of recent stories and notable research from the world of machine learning.

Labeling and Annotation Startups: The Unsung Heroes of AI

This week in AI, it’s time to shine a spotlight on labeling and annotation startups – companies like Scale AI, which is reportedly in talks to raise new funds at a staggering $13 billion valuation. These platforms may not get as much attention as flashy new generative AI models like OpenAI’s Sora, but they play a crucial role in the development of modern AI.

The data on which many AI models train must be labeled, with tags or markings that help the models understand and interpret the data during the training process. For example, an image recognition model may be trained using labels in the form of markings around objects, “bounding boxes,” or captions for each person, place, or object depicted in an image.

The accuracy and quality of these labels have a significant impact on the performance and reliability of the trained models. However, annotation is a massive undertaking, often requiring thousands to millions of labels for the larger and more complex datasets used in AI training.

One might assume that data annotators are treated well, paid living wages, and given the same benefits as the engineers who build the models. Unfortunately, this is often not the case due to the harsh working conditions fostered by many annotation and labeling startups.

Some companies, like OpenAI, have billions of dollars in funding, yet they rely on annotators in third-world countries who are paid a mere few dollars per hour. These workers are often exposed to distressing content, such as graphic imagery, without adequate time off or access to mental health resources.

A recent article in NY Mag exposed the working conditions at Scale AI, which hires annotators in countries as far-flung as Nairobi and Kenya. Some tasks on Scale AI can take labelers multiple eight-hour workdays without any breaks, and pay as little as $10. These workers also face uncertainty as they are at the mercy of the platform, sometimes going long stretches without receiving work or getting abruptly kicked off the platform.

While some annotation and labeling platforms claim to provide “fair-trade” work, there are currently no regulations or strong industry standards for ethical labeling work. Each company has its own definition, leading to a wide range in treatment of data annotators.

So, what can be done about this? The need to label and annotate data for AI training will not go away without a significant technological breakthrough. It is not realistic to expect these platforms to self-regulate, so the best hope for change may lie in policymaking. While this is a complex and challenging prospect, it is the most promising way to improve conditions for data annotators.