Possible alternative: Google Aims to Resolve Gemini’s Image Diversity Problem within Weeks

Google is hopeful it will soon be able to ‘unpause’ the ability of its multimodal generative AI tool, Gemini, to depict people, per DeepMind founder, Demis Hassabis. The capability to respond to prompts for images of humans should be back online in the “next few weeks”, he said today. Asked by moderator, Wired’s Steven Levy, to explain what went wrong with the image generation feature, Hassabis sidestepped a detailed technical explanation. Instead he suggested the issue was caused by Google failing to identify instances when users are basically after what he described as a “universal depiction”. The issue is “very complex”, he suggested — likely demanding a whole-of-society mobilization and response to determine and enforce limits.

Google is eagerly anticipating the reemergence of its multimodal generative AI tool, Gemini, to accurately portray individuals, according to Demis Hassabis, founder of DeepMind. The ability to generate images of people in response to prompts will be reinstated in the “upcoming weeks”, he announced during a presentation at the Mobile World Congress in Barcelona.

Last week, Google suspended the use of Gemini after receiving feedback from users regarding the tool’s production of inaccurate historical images. This included images of the US founding fathers as a diverse group of individuals, instead of solely white men.

In an on-stage interview at the Mobile World Congress in Barcelona, Hassabis addressed inquiries about the product mishap.

Steven Levy, moderator for Wired, inquired about the error in the image generation feature, to which Hassabis avoided giving a technical explanation. Rather, he implied that the issue resulted from Google’s inability to recognize when users wanted a “universal depiction”. He also mentioned the complexities that come with advanced AI.

“This is a challenge we are all facing. For instance, if a prompt requests an image of a person walking a dog or a nurse in a hospital, the user is clearly seeking a ‘universal depiction’. This is especially important to consider as Google serves over 200 countries worldwide and cannot predict the user’s background or situation. It’s essential to present a universal range of possibilities.”

Hassabis identified the problem as a “well-intended feature” to promote diversity in Gemini’s image generation of individuals. However, it was implemented too broadly, affecting all aspects of the tool.

Hassabis affirmed that prompts for historical figures should result in a “much narrower distribution” of images. He hinted at possible solutions for prompts regarding people in the future.

“It’s imperative that we prioritize historical accuracy. As a result, we have temporarily removed that feature and are working to resolve it. We aim to have it up and running in the next few weeks.”

When asked how to prevent the misuse of generative AI tools by malicious entities who may try to spread propaganda, Hassabis admitted that there is no simple solution. He acknowledged the complexity of the issue and emphasized the need for collective action and guidelines to prevent misuse.

Hassabis stressed the importance of extensive research and discussions with not only tech companies, but also with civil society and governments. He emphasized the societal and technical implications of AI and the need for collective input to determine the values and representations we want these systems to have. Hassabis also acknowledged the potential for bad actors to exploit these technologies for harm, despite not being the intention of those who created them.

Hassabis also addressed the challenges associated with open source AI models, which Google offers. He mentioned that customers desire open source systems that they have complete control over. However, he posed the question of ensuring that downstream users do not use the powerful systems for harm.

Hassabis acknowledged that this may not be as pressing of an issue now, as the systems are still in their early phases. However, as technology advances and future systems gain capabilities such as planning and problem-solving, he emphasized society’s responsibility to consider the potential consequences of widespread use and the potential misuse by malicious entities.

In addition to his thoughts on the implications of AI technology, Hassabis was questioned about the future of AI devices and the impact on the mobile market. He predicted a surge of “next generation smart assistants” that will greatly benefit individuals in their daily lives, as opposed to the superficial features of previous AI assistants. He also hinted at the possibility of changes in the mobile hardware individuals choose to carry with them.