Google DeepMind's Genie 2 Turns Text and Images into 3D Games

AI Models/Tools

Gupshup CEO on the Future of Multilingual Conversational AI

December 20, 2024

CEO Beerud Sheth on how the business of conversational AI is developing in emerging markets – and what’s ahead for the industry.

From Knowledge Base to Co-Pilot: Gen AI’s Big Leap and What’s to Come

December 19, 2024

An opinion piece from a partner at Sierra Ventures

Applied AI: Unlocking True Value within the Enterprise

December 16, 2024

An opinion piece from a partner at Forum Ventures

Accenture-backed Firm Launches No-Hallucination AI for Wealth Advisors

December 13, 2024

Stardog CEO Kendall Clark discusses the technique they used to ensure accuracy.

Scaling LLMs Without More Data: Why Reasoning is the New Inference Paradigm

December 13, 2024

An opinion piece by the head of enterprise platform at Broadridge, which provides critical infrastructure powering the financial services industry

Linux Survey: 82% Say Open Source is Key to a Sustainable AI Future

December 12, 2024

Also, 61% believe the benefits of open source outweigh the risks of it being used nefariously.

Google’s Gemini 2.0 to Enable a World of AI Agents

December 12, 2024

Also, the return of Google glasses (it's not what you think)

TLDR

Google DeepMind unveiled Genie 2, an updated version of its AI model that can turn a single image and text description into a video game.
Genie 2 creates playable 3D environments for humans and AI bots. Genie 1 was only in 2D.
Google DeepMind said Genie 2 could be used to train AI-powered robots in simulations, which would be safer than doing so in the physical world.

Google DeepMind, the search giant’s AI division whose CEO just won a Nobel Prize, has unveiled an updated version of its AI model that can turn a single image and text description into a video game.

Called Genie 2, the foundation world model can create a variety of “action-controllable, playable 3D environments” for humans or AI agents using a keyboard and mouse. Genie 1 was limited to 2D.

Genie 2 is called a world model because it can simulate virtual worlds. Trained on a large-scale video dataset, Genie 2 can display object interactions, complex character animation, physics (such as gravity and splashing water effects) and behavior modeling of other agents. The world it creates can last up to a minute, with most in the 10- to 20-second range.

“This means anyone can describe a world they want in text, select their favorite rendering of that idea, and then step into and interact with that newly created world,” wrote Google DeepMind researchers in a blog post.

The games can be used to train generalist AI robots in different environments. A common problem is the lack of rich, diverse and safe training environments for so-called embodied AI, the researchers said.

How it works

Genie 2 is a powerful AI model that learns from a large video dataset and uses a process that compresses video frames into simpler, meaningful representations through an autoencoder. These compressed frames are then analyzed by a transformer model that predicts how the video should progress, step-by-step, using a method similar to how text-generating models like ChatGPT work.

When Genie 2 is creating new videos, it generates them one frame at a time, using previous frames and actions as a guide. To make sure the actions in the video are more controlled and realistic, it uses a technique called classifier-free guidance.

Author

Deborah Yao

View all posts

Google DeepMind’s Genie 2 Turns Text and Images into 3D Games

How it works

Author