Genie: Generative Interactive Environments

Authors: Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Maria Elisabeth Bechtle, Feryal Behbahani, Stephanie C.Y. Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando De Freitas, Satinder Singh, Tim Rocktäschel

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train Genie on a large-scale dataset collected from publicly available Internet videos of 2D Platformer games (referred to from here on as Platformers ). We construct the Platformers dataset by filtering publicly available videos for keywords relating to platformers, yielding 55M 16s video clips at 10FPS, with 160x90 resolution. The final dataset contains 6.8M 16s video clips (30k hours)... We examine the video generation performance of Genie via two factors, namely video fidelity, i.e. the quality of video generation, and controllability, i.e. how much impact the latent actions have in video generation. For video fidelity we use the Frechet Video Distance (FVD)... For controllability, we devise a metric based on peak signal-to-noise ratio (PSNR) which we call t PSNR... In this section, we investigate the scaling behavior of our model. To this end, we conduct studies that explore the impact of both model size and batch size. See Appendix D for more details on architecture and compute usage. ... Ablation Studies
Researcher Affiliation Collaboration 1Google DeepMind 2University of British Columbia.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No We have chosen not to release the trained model checkpoints, the model s training dataset, or examples from that data to accompany this paper or the website.
Open Datasets Yes We train Genie on a large-scale dataset collected from publicly available Internet videos of 2D Platformer games (referred to from here on as Platformers ). ... To verify the generality of our method, we also consider the robotics datasets used to train RT1 (Brohan et al., 2023), combining their dataset of 130k robot demonstrations with a separate dataset of simulation data and the 209k episodes of real robot data from prior work (Kalashnikov et al., 2018).
Dataset Splits No The paper mentions training data and test sets, but does not explicitly provide details about train/validation/test dataset splits with specific percentages or counts for the main experiments.
Hardware Specification Yes As a result, for our final model, we train a 10.1B dynamics model with a batch size of 512, for a total of 125k steps, using 256 TPUv5p. ... For this experiment we make use of TPUv2 and TPUv3 (Jouppi et al., 2020).
Software Dependencies No We make use of the Deep Mind Jax ecosystem (Babuschkin et al., 2010) and specifically thank Andy Brock for building the internal framework we used for our model training and Arthur Brussee who provided an initial interface that enabled us to play our models. The paper does not provide specific version numbers for key software components.
Experiment Setup Yes Table 7: Platformers video tokenizer hyperparameters. ... Table 8: Video tokenizer optimizer hyperparameters. ... Table 9: Dynamics model optimizer hyperparameters.