Home About us

True · open game, Google has created the first infinite life simulation game Unbounded

Deep Learning and NLP 2024/10/27 01:32

Source | The Heart of the Machine

If you're an open-world or role-playing gamer, you must have dreamed of a game with unlimited freedom. There are no air walls, no plot kills, and no interaction restrictions.

Now, our dreams may really be starting to come true.

Harnessing the power of large language models and visually generated models, Google's new Unbounded game has shown us what's possible.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

Unbounded a tweet by Jialu Li

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

The game world is AI-generated, and can be infinitely extended and evolved as the game progresses, and the characters inside can be customized according to the user's requirements, and there are no interaction rules in the game. Everything is open, and even your imagination can't limit it, like the mind game in Ender's Game.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

Mental game footage from the movie Ender's Game

Although the game as a whole is relatively simple and more of a proof of concept, the hidden possibilities are enough to arouse people's infinite reverie.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

The roots of Google's Unbounded game design ideas can be traced back to James P. Carse's 1986 book, The Finite and the Infinite, which depicted two different types of games.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

In Cass's definition, finite games are "games with the aim of winning" and they have boundary conditions, fixed rules, and a definite end point. The goal of infinite games is to keep the game going, there are no fixed boundary conditions, and the rules are constantly evolving.

Traditional video games are basically limited games, with limitations on computer programming and computer graphics. For example, all game mechanics must be fully predefined in the programming language, and all graphical assets must be pre-designed (there are still structural limitations to modular procedural generation). Such a game allows only a limited set of actions and paths, which are sometimes predefined. They also usually have predefined rules, boundary conditions, and win conditions.

The evolution of generative models has opened up entirely new possibilities for gaming. If you think about it, we can even build what we call "generative infinite video games".

A recent paper by Google and the University of North Carolina at Chapel Hill explored this possibility, proposing the first interactive generative infinite game, Unbounded, in which game behaviors and outputs are generated by AI models, transcending the limitations of hard-coded systems.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

According to the team, Unbounded is inspired by sandbox life sims and video pet games like Tiny Computer Man, The Sims, and Takuma Utako. It also incorporates elements of tabletop role-playing games like Dungeons & Dragons, which offer an unlimited storytelling experience that video games don't have.

Unbounded's game mechanics revolve around character simulation and open-ended interactions, as shown in Figure 2.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

Players can insert their own characters into the game to define their own character's appearance and personality. The game generates a world where these characters can explore the environment, interact with objects, and have conversations. The game generates new scenarios, stories, and challenges based on the player's actions and choices, creating a personalized and limitless gaming experience. The following image shows some examples of building games.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

Specifically, Unbounded has the following features:

1. Character Personalization: Players can insert their own characters into the game to define their own appearance and personality.

2. Game Environment Generation: Unbounded generates a persistent world for characters to explore and interact with.

3. Open-ended interaction: Players can interact with characters using natural language commands, and there are no predefined rules to limit interaction.

4. Real-time generation: The team emphasized the importance of game speed, with the actual game achieving a 5-10x speedup compared to the primary implementation, with a latency of about one second for each new scene.

In order to do this, the team has made certain technological innovations in both language models and visual generation.

Methodology

Unbounded is an interactive generative infinite game powered by text-image generation models and large language models.

Unbounded includes:

(1) Personalized Custom Roles: Users create unique roles with customizable appearance and personality;

(2) Dynamic world creation: the system generates a persistent interactive game world for exploration;

(3) Open interaction: Players interact with characters through natural language, and the game dynamically generates new scenes and storylines according to player actions;

(4) Generate at interactive speed: The game runs interactively in near real-time, achieving a refresh rate of nearly one second.

 Latent consistency model

A key feature of Unbounded is its ability to provide real-time interaction for games that are entirely based on generative models. This is achieved through the use of a latent consistency model (LCM) that produces high-resolution images in just two diffusion steps. By leveraging LCM, Unbounded enables real-time text-to-image (T2I) generation, which is critical for delivering an interactive gaming experience with a refresh rate of nearly one second.

Regional IP adapter with block loss capability

Another key feature of Unbounded is the ability to generate roles in a predefined environment and perform different actions based on user instructions.

In the world of gaming, it's important to keep characters and environments consistent, and there are some challenges in how they are handled.

The study found that existing methods could not consistently meet all interaction speed requirements. Therefore, this paper proposes a novel regional IP-Adapter to consistently implant roles in a predefined environment following text prompts.

The study proposes an improved version of the IP adapter that is capable of dual conditioning both the principal and the environment, allowing the generation of predefined roles in the user-specified environment. Unlike the original IP adapter, which focuses on single image adjustment, the proposed method introduces dual conditioning and dynamic region injection mechanisms to represent both concepts in the resulting image.

For example, given the text prompt "Desert under the sky, witch makes cactus bloom with bright, glowing flowers" and an image of a desert environment, as shown in Figure 4, the model needs to know that the character in the prompt should be next to the cactus, and that the cactus and flowers spawn in the desert environment.

This requires the model to correctly (1) preserve the environment, (2) retain the role, and (3) follow the prompts. However, encoding the environment with an IP adapter can greatly impair the character of the original image ((2) and (3) in Figure 8).

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

The regional IP adapter solves this problem very well. Specifically, this paper introduces a dynamic mask-based method that utilizes cross-attention between character-text embedding and hidden states at each layer of the model. As shown in Figure 4, the method in this paper applies adapters to the areas corresponding to the environment and the role, respectively, to prevent environmental conditions from interfering with the appearance of the character and vice versa.

For regional IP adapters, the study uses a dynamic mask of cross-attention between character text and hidden state. The quality of this mask is key to separating character and environment generation. Figure 5 shows the attention map between the embedded and hidden states of the characters in the cross-attention layer of the downsampled block. It can be observed that attention is not focused on the characters, but is scattered over the entire image of these blocks. This suggests that the diffusion model does not separate character and environment generation in these layers, but instead focuses on the overall image structure based on text prompts.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

A language model game engine with open interactions and integrated game mechanics

The study constructs a character life simulation game with two LLM agents:

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

Experiments and results

In the experiment, the study used GPT-4o to collect an evaluation dataset consisting of 5,000 triples (character images, environmental descriptions, text prompts). It includes 5 characters (dogs, cats, pandas, witches, and wizards), 100 different environments, and 1,000 text prompts (10 each).

Comparison between environment consistency and role consistency

In this experiment, the authors mainly compared regional IP adapters with block loss with previous methods.

As shown in Table 1, the proposed approach consistently outperforms previous approaches in maintaining environmental consistency and role consistency, while also achieving comparable performance in maintaining semantic alignment.

Specifically, in terms of role consistency, the proposed method significantly exceeds StoryDiffusion in CLIP-I^C and StoryDiffusion 0.057 in DreamSim^C. In terms of environmental consistency, the proposed method is also superior to other methods.

Deep learning and NLP, true · open games, Google created the first infinite life simulation game Unbounded

Figure 7 is a qualitative comparison with other methods. The zone IP adapter uses block loss technology to consistently produce a consistent image, whereas other methods may not be able to include or produce inconsistent looking roles. In addition, the study also shows that the proposed method is able to balance environmental consistency and role consistency well, while other methods may generate environments that are different from the conditional environment.

深度学习与NLP, 真·开放式游戏,谷歌造出首个无限人生模拟游戏Unbounded

Validity of dynamic zone IP adapters with block loss

Experiments have proven that a regional IP adapter with block loss is essential for following text prompts to place a character in the environment.

As shown in Table 2, the addition of block loss improves both environment and character consistency, with an increase of 0.291 in CLIP-I^E and 0.264 in CLIP-I^C, along with better alignment between the text prompt and the resulting image. In addition, the regional IP adapter enhances role consistency and text alignment while maintaining comparable performance for environment consistency.

深度学习与NLP, 真·开放式游戏,谷歌造出首个无限人生模拟游戏Unbounded

The qualitative results are shown in Figure 8. As you can see, you can achieve good environment rebuild based on environments that use IP adapters, but role consistency is affected by the style of the environment.

深度学习与NLP, 真·开放式游戏,谷歌造出首个无限人生模拟游戏Unbounded

Block loss technology improves the ability to follow text prompts, resulting in the correct layout of characters and environment spaces in the resulting image. However, the appearance of the character is still affected by the surrounding environment. By combining the newly proposed region injection mechanism with the newly proposed dynamic masking scheme, the resulting images achieve strong role consistency while also effectively considering environmental conditions.

Effectiveness of distillation specialization LLMs

Experiments have shown that the team's diverse user-simulator interaction data can effectively distill Gemma-2B into a powerful game engine.

As shown in Table 3, smaller LLMs (i.e., Gemma-2B, Llama3.2-3B) or slightly larger LLMs (i.e., Gemma-7B) perform worse for zero-shot inference compared to the model distilled by the team, suggesting that distilling more powerful LLMs for game world and character action simulation tasks is effective.

深度学习与NLP, 真·开放式游戏,谷歌造出首个无限人生模拟游戏Unbounded

In addition, judging from the result data, the performance of this distilled version model is comparable to that of GPT-4o, which is enough to show the effectiveness of the method. The team also investigated the impact of distillation data size on performance by comparing the distillation of the Gemma-2B model using 1K and 5K data to see how the results differed. The results are not surprising, and using a larger data set is better in every way.

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com