AI models have been around for decades, playing games with one goal in mind: winning. However, researchers at Google Deepmind have taken a different approach with their latest creation. Their model not only learns to play multiple 3D games like a human, but also strives to understand and act on verbal instructions.
While there are existing “AI” characters in games that can follow commands, they are limited to formal in-game commands and can’t replicate human-like behavior. Deepmind’s SIMA (scalable instructable multiworld agent) was trained on hours of video footage of humans playing games, along with annotations provided by data labelers. Through this data, the model learns to associate visual representations with actions, objects, and interactions.
Researchers even recorded videos of players giving instructions to one another in-game, helping the model learn to understand and follow verbal cues. For example, it can learn that a certain pattern of pixels on the screen represents the action of “moving forward,” or that interacting with a door-like object is equivalent to “opening a door.”
The training videos were taken from a variety of games, including Valheim and Goat Simulator 3. The developers of these games were involved with and consented to the use of their software. One of the main goals of the researchers was to test the model’s ability to play games it hasn’t encountered before, a process known as generalization.
The results showed that the AI agents trained on multiple games performed better on unfamiliar games compared to those trained on just one game. However, unique and specific mechanics or terms in a game can still stump even the most prepared AI. The researchers believe that the model’s lack of exposure to these mechanics is the main barrier, but with more training data, it could learn them as well.
This is because, despite the vast amount of in-game lingo, there are only a limited number of “verbs” that truly affect the game world. Whether you’re building a lean-to, pitching a tent, or summoning a magical shelter, they all fall under the verb of “building a house.” The researchers’ map of the few dozen primitive actions the agent recognizes is intriguing:
The researchers’ ambition, along with advancing the field of agent-based AI, is to create a game-playing companion that is more natural and adaptable than the current stiff, hard-coded ones.
One of the leads of the project, Tim Harley, explains, “Rather than playing against a superhuman agent, you can have cooperative SIMA players beside you that you can give instructions to.”
Since the agents only see the pixels on the game screen, they have to learn how to perform tasks much like humans do. However, this also allows them to adapt and exhibit emergent behaviors.
Many companies are also exploring this type of open-ended collaboration and creation. For example, NPC conversations could utilize an LLM-type chatbot, and AI is being used to simulate and track simple improvised actions and interactions.
Some researchers are also looking into infinite games, such as MarioGPT, but that’s a topic for another time.