models

Exploring AI: Combatting Racial Bias in Image Generating Technology

Gettyimages 1062086882
This week in AI, Google paused its AI chatbot Gemini’s ability to generate images of people after a segment of users complained about historical inaccuracies. Google’s ginger treatment of race-based prompts in Gemini didn’t avoid the issue, per se — but disingenuously attempted to conceal the worst of the model’s biases. Yes, the data sets used to train image generators generally contain more white people than Black people, and yes, the images of Black people in those data sets reinforce negative stereotypes. That’s why image generators sexualize certain women of color, depict white men in positions of authority and generally favor wealthy Western perspectives. Whether they tackle — or choose not to tackle — models’ biases, they’ll be criticized.

“The Power of Politeness: How Treating a Chatbot with Kindness Can Improve Its Performance!”

Gettyimages 1533302708
Phrasing requests in a certain way — meanly or nicely — can yield better results with chatbots like ChatGPT than prompting in a more neutral tone. So what’s the deal with emotive prompts? Nouha Dziri, a research scientist at the Allen Institute for AI, theorizes that emotive prompts essentially “manipulate” a model’s underlying probability mechanisms. Why is it so trivial to defeat safeguards with emotive prompts? Another reason could be a mismatch between a model’s general training data and its “safety” training datasets, Dziri says — i.e.

Reddit rakes in over $200M through data licensing agreements

Reddit App Icon Ios
In its IPO prospectus filed today with the U.S. Securities and Exchange Commission, Reddit repeatedly emphasized how much it thinks it stands to gain — and has gained — from data licensing agreements with the companies training AI models on its over one billion posts and over 16 billion comments. “In January 2024, we entered into certain data licensing arrangements with an aggregate contract value of $203.0 million and terms ranging from two to three years,” the prospectus reads. “We expect a minimum of $66.4 million of revenue to be recognized during the year ending December 31, 2024 and the remaining thereafter.”Now, it’s a mystery as to which AI vendors are licensing data from Reddit so far. Why’s Reddit data valuable? Reddit previously didn’t gate access to its data for AI training purposes.

“Consistent Expansion: AI Imaging in the Lead with Stable Diffusion 3 Surpassing Sora and Gemini”

Stable Diffusion 3
Stability has announced Stable Diffusion 3, the latest and most powerful version of the company’s image-generating AI model. Sora, OpenAI’s impressive video generator, apparently works on similar principles (Will Peebles, co-author of the paper, went on to co-lead the Sora project). (Anthropic, for its part, has not focused on image or video generation publicly, so it isn’t really part of this conversation.) Stable Diffusion seems to want to be the white label generative AI that you can’t do without, rather than the boutique generative AI you aren’t sure you need. Interestingly, the company has put safety front and center in its announcement, stating:We have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors.

“Automating AI Training Data Curation: DatologyAI’s Revolutionary Technology”

Gettyimages 1148091243
Massive training data sets are the gateway to powerful AI models — but often, also those models’ downfall. Morcos’ company, DatologyAI, builds tooling to automatically curate data sets like those used to train OpenAI’s ChatGPT, Google’s Gemini and other like GenAI models. “However, not all data are created equal, and some training data are vastly more useful than others. History has shown automated data curation doesn’t always work as intended, however sophisticated the method — or diverse the data. The largest vendors today, from AWS to Google to OpenAI, rely on teams of human experts and (sometimes underpaid) annotators to shape and refine their training data sets.

Exploring New Materials through Artificial Intelligence: A Startup’s Innovative Approach

Gettyimages 1082281822
Orbital Materials — founded by Jonathan Godwin, who previously was involved with DeepMind’s material research efforts — is creating an AI-powered platform that can be used to discover materials ranging from batteries to carbon dioxide-capturing cells. Godwin says he was inspired to found Orbital Materials by seeing how the techniques underpinning AI systems like AlphaFold, DeepMind’s AI that can predict a protein’s 3D structure from its amino acid sequence, could be applied to the materials sciences. “[Yet] demand for new advanced materials … is growing hugely as our economies become electrified and de-carbonized.”Orbital Materials isn’t the first to apply AI to materials R&D. Osmium AI, led by an ex-Googler and backed by Y Combinator, enables industrial customers to predict the physical properties of new materials, then refine and optimize those new materials leveraging AI. But what sets Orbital Materials apart is its proprietary AI model for materials science, Godwin claims.

“Introducing: Google’s Latest Additions – Two New Open LLM Programs”

Gettyimages 1279291007
Barely a week after launching the latest iteration of its Gemini models, Google today announced the launch of Gemma, a new family of lightweight open-weight models. To get started with Gemma, developers can get access to ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia’s NeMo. While Google highlights that these are open models, it’s worth noting that they are not open-source. Indeed, in a press briefing ahead of today’s announcement, Google’s Janine Banks stressed the company’s commitment to open source but also noted that Google is very intentional about how it refers to the Gemma models. “[Open models] has become pretty pervasive now in the industry,” Banks said.

“Bioptimus Secures $35M in Seed Funding to Propel AI-Driven Biological Model Development”

Gettyimages 1362476720
There’s a new generative AI startup based in Paris. Bioptimus will leverage this unique data set to train its foundational model. Creating new AI models is such a daunting task that creating a separate entity made more sense. “As a ‘pure player’ in foundational models, Bioptimus is better set up to do this.”The startup has also signed a partnership with Amazon Web Services. Now that Bioptimus is well funded, it’s time to work on the AI model and see what the biotech research community can do with it.

“Dili’s Desire: Harnessing AI for Automated Due Diligence”

Pixta 94590765 L
“[AI] affects all parts of an investment fund, from analysts to partners and back-office functions,” Song said. Dili isn’t the first to apply AI to the due diligence process. Gartner predicts that by 2025, more than 75% of VC and early-stage investor executive reviews will be informed using AI and data analytics. The question is, can Dili’s AI — or any AI really — be trusted when it comes to managing a portfolio? Dili ran an initial pilot last year with 400 analysts and users across different types of funds and banks.

Semron Seeks to Substitute ‘Memcapacitors’ for Chip Transistors

Gettyimages 1425576452
The electric field approach minimizes the movement of electrons at the chip level, reducing energy usage — and heat. TOPS/W is a bit of a vague metric, but the takeaway is that memcapacitors can lead to dramatic energy consumption reductions while training AI models. Now, it’s early days for Semron, which Kirschen says is in the “pre-product” stage and has “negligible” revenue to show for it. EnCharge, like Semron, is designing computer chips that use capacitors rather than transistors, but using a different substrate architecture. Semron will be a key element in solving this problem by providing a revolutionary new chip that is inherently specialized on computing AI models.