Learning Interactive Real-World Simulators

Authors: Sherry Yang, Yilun Du, Seyed Kamyar Seyed Ghasemipour, Jonathan Tompson, Leslie Pack Kaelbling, Dale Schuurmans, Pieter Abbeel

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate three specific use cases. We first show how the simulator enables a vision-language policy to perform long-horizon goal-conditioned tasks through hindsight relabeling of simulated experience (Andrychowicz et al., 2017). In addition to learning high-level vision-language policies, we illustrate how the simulator can enable learning low-level control policies by leveraging model-based reinforcement learning (RL) (Sutton, 1988). Also, Table 1: Ablations of history conditioning using FVD, FID, and Inception score, and CLIP score on Ego4D. Table 2: Evaluation of long-horizon actions. Reduction in distance to goal (RDG) defined in Equation 3 across 5 evaluation runs of VLM trained using simulated long-horizon data (bottom row) compared to VLM trained on original short-horizon data (top row).
Researcher Affiliation Collaboration Sherry Yang,1,2 Yilun Du3 Kamyar Ghasemipour2 Jonathan Tompson2 Leslie Kaelbling3 Dale Schuurmans2,4 Pieter Abbeel1 1UC Berkeley 2Google Deep Mind 3MIT 4University of Alberta
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Video demos can be found at https://universal-simulator.github.io. (Checking the website, it states "Coming soon: code release" as of the paper's publication, indicating it's not yet available.)
Open Datasets Yes see all datasets used to train Uni Sim in Appendix B. (Table 5 lists multiple datasets with citations, e.g., "Habitat HM3D (Ramakrishnan et al., 2021)", "Ego4D (Grauman et al., 2022)", "LAION-400M (Schuhmann et al., 2021)").
Dataset Splits Yes We ablate over choices of past frames to condition on using a validation split of the Ego4D dataset (Grauman et al., 2022). For generating data to train VLMs, we take the training split of Activity Net Captions which consists of 30,740 text-video examples after the 50/25/25% train/val1/val2 split as in Chen et al. (2023).
Hardware Specification Yes Uni Sim model has 5.6B parameters and requires 512 TPU-v3 and 20 days to train on all data.
Software Dependencies No The paper mentions models (e.g., Pa LI 3B) and algorithms (e.g., REINFORCE) but does not provide specific version numbers for software dependencies like PyTorch, TensorFlow, or other libraries used for implementation.
Experiment Setup Yes The model and training hyperparamters of Uni Sim are summarized in Table 6. This includes specifics such as Learning rate 0.0001, Batch size 256, Training steps 1000000, and Dropout 0.1.