GROOT: Learning to Follow Instructions by Watching Gameplay Videos
Authors: Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft Skill Forge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. |
| Researcher Affiliation | Academia | Shaofei Cai1,2, Bowei Zhang3, Zihao Wang1,2, Xiaojian Ma5, Anji Liu4, Yitao Liang1 Team Craft Jarvis 1Institute for Artificial Intelligence, Peking University 2School of Intelligence Science and Technology, Peking University 3School of Electronics Engineering and Computer Science, Peking University 4Computer Science Department, University of California, Los Angeles 5Beijing Institute for General Artificial Intelligence (BIGAI) |
| Pseudocode | No | The paper includes mathematical formulations, architectural diagrams (Figure 2), and detailed descriptions of components, but it does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper provides a project website URL (https://craftjarvis.github.io/GROOT) on the first page, but it does not contain an explicit statement by the authors confirming the release of their source code for the described methodology or a direct link to a code repository within the text. |
| Open Datasets | Yes | The contractor data is a Minecraft offline trajectory dataset provided by Baker et al. (2022) 3, which is annotated by professional human players and used for training the inverse dynamic model. 3https://github.com/openai/Video-Pre-Training |
| Dataset Splits | No | The paper states it uses the 'contractor data' for training but does not specify explicit training, validation, or test dataset splits in terms of percentages or absolute counts for its own model (GROOT). The evaluation is done on a new benchmark, not on a conventional validation split. |
| Hardware Specification | Yes | Type of GPUs NVIDIA RTX 4090Ti, A40; Parallel GPUs 8 |
| Software Dependencies | No | The paper mentions software components like 'Efficient Net-B0' (CNN Backbone), 'min GPT (w/o causal mask)' (Encoder Transformer), and 'Transformer XL' (Decoder Transformer) but does not provide specific version numbers for these libraries or frameworks. |
| Experiment Setup | Yes | Table 2: Hyperparameters for training GROOT. Includes: Optimizer Adam W, Weight Decay 0.001, Learning Rate 0.0000181, Warmup Steps 2000, Batch Size/GPU (Total) 2 (128), Training Precision bf16, Trajectory Chunk size 128, Attention Memory Size 256, Weight of KL Loss 0.01. |