DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
Authors: Zichen Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, Lerrel Pinto
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our dynamics-pretrained visual representation on a suite of simulated and real benchmarks. We compare Dyna Mo representations with pretrained representations for vision and control, as well as other self-supervised learning methods. Our experiments are designed to answer the following questions: (a) Does Dyna Mo improve downstream policy performance? (b) Do representations trained with Dyna Mo work on real robotic tasks? (c) Is Dyna Mo compatible with different policy classes? (d) Can pretrained weights be fine-tuned in domain with Dyna Mo? (e) How important is each component in Dyna Mo? |
| Researcher Affiliation | Academia | Zichen Jeff Cui Hengkai Pan Aadhithya Iyer New York University Siddhant Haldar Lerrel Pinto |
| Pseudocode | No | The paper provides an architectural diagram (Figure 3) and describes the training process in text, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | All of our datasets, and training and evaluation code will be made publicly available. |
| Open Datasets | Yes | We evaluate Dyna Mo on four simulated benchmarks Franka Kitchen [27], Block Pushing [28], Push-T [3], and LIBERO Goal [29], as well as eight robotic manipulation tasks on two real-world environments. |
| Dataset Splits | No | The paper details total demonstration trajectories for each environment (e.g., 'The dataset has 566 demonstration trajectories' for Franka Kitchen) but does not specify explicit train/validation/test dataset splits used for its experiments. |
| Hardware Specification | Yes | Compute used for training Dyna Mo: Franka Kitchen: 3 hours on 1x NVIDIA A100. Block Pushing: 7 hours on 1x NVIDIA A100. Push-T: 1 hour on 1x NVIDIA A100. LIBERO Goal: 2 hours on 1x NVIDIA H100. Allegro Manipulation: 3 minutes on 1x NVIDIA RTX A6000 for the sponge task, 4 minutes for the teabag task, and 3 minutes for the microwave task. x Arm kitchen: 4 hours on 1x NVIDIA RTX A6000. |
| Software Dependencies | No | The paper lists several official implementation repositories for baselines (e.g., 'Mo Co-v3: https://github.com/facebookresearch/moco-v3') and notes that their transformer encoder is based on 'nano GPT [97]', but it does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We present the Dyna Mo hyperparameters below. Table 8: Environment-dependent hyperparameters for Dyna Mo pretraining, random init (Obs. context, EMA β, Forward dynamics dropout, Transition latent dim). Table 9: Shared hyperparameters for Dyna Mo pretraining, random init (Optimizer Adam W, Learning rate 10^-4, Weight decay 0.0, Betas (0.9, 0.999), Gradient clip norm 0.1, Covariance reg. coefficient 0.04, Epochs 40, Batch size 64). |