Mind's Eye: Grounded Language Model Reasoning through Simulation
Authors: Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 39 tasks in a physics alignment benchmark demonstrate that Mind s Eye can improve reasoning ability by a large margin (27.9% zero-shot, and 46.0% few-shot absolute accuracy improvement on average). |
| Researcher Affiliation | Collaboration | 1Google Research, Brain Team, 2Dartmouth College |
| Pseudocode | No | The paper describes the components and their interactions in prose but does not include any formal pseudocode blocks or algorithms. |
| Open Source Code | No | For reproducibility, we run experiments mainly with publicly available LMs (e.g., GPT-3) and choose baseline methods that have open-sourced implementation. This statement refers to the use of existing open-source tools, not the release of the authors' own code. |
| Open Datasets | No | We propose a new multi-task physics alignment dataset, UTOPIA... The ground-truth answers to the questions are generated by the physics engine, which makes it easy to scale UTOPIA to larger sizes. The paper introduces a new dataset but does not provide specific access information (link, DOI, citation) to a publicly available version of the UTOPIA dataset used in their experiments. |
| Dataset Splits | No | For the convenience of benchmarking on huge LMs, we prepare 100 samples for each sub-task, resulting in a dataset with about 3,900 samples. We use this version of UTOPIA for evaluation across the paper. The paper uses a dataset for evaluation but does not specify a separate validation split or its size/percentages for model training/hyperparameter tuning. |
| Hardware Specification | Yes | The Mu Jo Co simulations can achieve 171 fps on one A6000 GPU... All experiments for Pa LM are run on TPU-v4 Pods... Training of the JAX-based text-to-code LMs runs on TPU-v3 Pods. |
| Software Dependencies | No | The paper mentions "Deep Mind s Mu Jo Co" as a physics engine and "JAX-based text-to-code LMs" but does not specify version numbers for these or other software libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The learning rates we use for training 0.3B and 1.5B LMs on C4 are {3.0e-4, 1.8e-4}, which are switched to {1.8e-4, 0.5e-4} when fine-tuning on the text-code pairs. We use cosine annealing to control learning rate over time with fixed warm-up steps (3k). |