curation

“Automating AI Training Data Curation: DatologyAI’s Revolutionary Technology”

Gettyimages 1148091243
Massive training data sets are the gateway to powerful AI models — but often, also those models’ downfall. Morcos’ company, DatologyAI, builds tooling to automatically curate data sets like those used to train OpenAI’s ChatGPT, Google’s Gemini and other like GenAI models. “However, not all data are created equal, and some training data are vastly more useful than others. History has shown automated data curation doesn’t always work as intended, however sophisticated the method — or diverse the data. The largest vendors today, from AWS to Google to OpenAI, rely on teams of human experts and (sometimes underpaid) annotators to shape and refine their training data sets.