“Google’s Image-Generating AI: A Confession of Loss of Control”

Google has apologized (or come very close to apologizing) for another embarrassing AI blunder this week, an image generating model that injected diversity into pictures with a farcical disregard for historical context. While the underlying issue is perfectly understandable, Google blames the model for “becoming” over-sensitive. But if you ask for 10, and they’re all white guys walking goldens in suburban parks? Where Google’s model went wrong was that it failed to have implicit instructions for situations where historical context was important. These two things led the model to overcompensate in some cases, and be over-conservative in others, leading to images that were embarrassing and wrong.

Google has issued an apology, or something resembling it, for yet another embarrassing blunder by their AI system this week. This time, it involved an image generating model that displayed a lack of understanding for historical context when attempting to inject diversity into pictures. However, while Google can place the blame on the model itself, it’s important to note that the model didn’t create itself.

The AI system responsible for this mishap is Gemini, the flagship conversational AI platform at Google. When prompted, it would utilize a version of the Imagen 2 model to create images on demand. Recently, it was discovered that certain historical scenarios or figures resulted in laughable and inaccurate visual representations. For example, the model depicted the founding fathers, known to be white slave owners, as a diverse group including people of color.

As expected, this easily replicable error was met with criticism and mockery from those online. It also became intertwined with the ongoing debate surrounding diversity, equity, and inclusion in today’s society, and was used by pundits as evidence of the supposed “woke mind virus” infiltrating the already liberal tech sector.

Outraged individuals pointed fingers at the issue, blaming it on unchecked DEI policies and Google’s supposed “ideological echo chamber”. It must be noted, however, that those in the left were also disturbed by this bizarre occurrence.

But for those familiar with the tech industry, and as explained in Google’s apologetic statement, this problem stemmed from a reasonable workaround for systemic bias in training data.

“Say you want to use Gemini to create a marketing campaign, and you ask it to generate 10 pictures of ‘a person walking a dog in a park.’ Because you don’t specify the type of person, dog, or park, it’s dealer’s choice – the generative model will put out what it is most familiar with.”

This often results in visuals that are not reflective of reality, but rather a product of the training data, which can contain inherent biases. In most cases, this data is heavily populated with images of white individuals – whether it be stock imagery or rights-free photography, leading the model to default to white individuals if not specified otherwise.

Google explains, “Because our users come from all over the world, we want it to work well for everyone. If you ask for a picture of football players, or someone walking a dog, you may want to receive a range of people. You probably don’t just want to only receive images of people of just one type of ethnicity (or any other characteristic).”

It’s not an issue to receive an image of a white man walking a golden retriever in a suburban park. But if you ask for ten images and they all depict white men – especially if you live in a place like Morocco where the people, dogs, and parks look very different – then it becomes a problem. In these situations, the model should strive for diversity, not homogeneity.

This issue is prevalent in all types of generative media, with no simple solution at hand. To tackle this issue, companies like Google, OpenAI, and Anthropic often include hidden instructions for the model.

It’s important to note the frequency of these implicit instructions. The LLM ecosystem, for example, heavily relies on them. These “system prompts,” as they are sometimes called, provide guidelines for the model – whether it be to “be concise” or “avoid profanity” – before each conversation takes place. When prompted for a joke, for instance, the model will not produce a racist joke because it has been trained, like most of us, to avoid that type of humor. This isn’t a hidden agenda – though more transparency would be appreciated – it’s simply a part of the infrastructure.

In Google’s case, the model failed to account for situations where historical context was imperative. While adding an instruction such as “the person is of a random gender and ethnicity” to a prompt like “a person walking a dog in a park” made it more accurate, the same cannot be said for prompts such as “the US founding fathers signing the Constitution”.

As stated by Google’s SVP Prabhakar Raghavan, “First, our tuning to ensure that Gemini showed a range of people failed to account for cases that should clearly not show a range. And second, over time, the model became way more cautious than we intended and refused to answer certain prompts entirely – wrongly interpreting some very anodyne prompts as sensitive. These two things led the model to overcompensate in some cases, and be over-conservative in others, leading to images that were embarrassing and wrong.”

While it may be difficult to admit fault, I forgive Prabhakar for stopping just short of apologizing. However, an interesting choice of words can be found in his statement: “The model became way more cautious than we intended.”

But how can a model “become” anything? It’s software. Someone – thousands of Google engineers, in this case – built it, tested it, and iterated on it. Someone wrote the implicit instructions that improved certain responses and caused others to fail in a comical manner. If someone were to inspect the full prompt, it’s likely they would discover what the Google team did wrong.

Google blames the model for “becoming” something that it wasn’t “intended” to be. But the model was made by them! It’s similar to breaking a glass and saying “it fell” instead of admitting fault. (I’ve been guilty of this myself).

Mistakes made by these models are bound to happen, as they can often lack understanding and reflect biases while behaving in unexpected ways. However, the responsibility for these mistakes does not lie with the models themselves, but with the people who created them. Today, it may be Google’s mistake, tomorrow it could be OpenAI’s, and in the near future, X.AI’s.

These companies have a vested interest in convincing the public that AI is capable of making its own mistakes. It’s important not to be fooled by their claims.

Remember to use appropriate HTML tags such as <p> for paragraphs, <blockquote> for quotes, <ul> or <ol> for lists, and any other necessary tags to maintain a well-structured, easily readable version of the article. It’s important to remember that these models are created by humans, and their mistakes should be attributed to them instead of shifting the blame onto the AI itself.