Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation
Authors: David Brandfonbrener, Ofir Nachum, Joan Bruna
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical evidence of this claim through evaluations on a variety of simulated visuomotor manipulation problems. First, we conduct an extensive empirical evaluation and introspection of the candidate algorithms along with several strong baselines. |
| Researcher Affiliation | Collaboration | David Brandfonbrener New York University Ofir Nachum Google Joan Bruna New York University |
| Pseudocode | No | The paper describes the algorithms using mathematical equations but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Full details about the methodology are in Appendix C and code is at https://github.com/davidbrandfonbrener/imitation_pretraining. |
| Open Datasets | Yes | Our suite consists of six different pretraining datasets on varied tasks and of varied size. The datasets are described in full detail in Appendix C. All of our environments are based on the Mu Jo Co simulator [Todorov et al., 2012]. The point mass environment is derived from the DM control suite [Tunyasuvunakool etulators, 2020]. The kitchen environment and dataset was introduced in Gupta et al. [2019]. The rest of the environments are taken from Metaworld [Yu et al., 2020]. |
| Dataset Splits | Yes | First there is a large multi-context pretraining dataset that will be used for representation learning, specifically to learn an observation encoder. Second, there is a small single-context finetuning dataset for policy learning on top of the pretrained representation. Results presented in Figure 9 show the average across datasets of the cross-representation prediction error on a validation set from the pretraining distribution (normalized by the mean prediction error on each dataset). |
| Hardware Specification | Yes | Pretraining was all done on an internal cluster using RTX8000 GPUs. Finetuning and evaluation was all done on an internal cluster on CPU (since the finetuned policy network is small and environments run on CPU). |
| Software Dependencies | No | The paper mentions software like JAX, flax, optax, Mu Jo Co, and DM control suite, but does not provide specific version numbers for these software dependencies, only citations to their papers. |
| Experiment Setup | Yes | Training hyperparameters. For pretraining, we split the datasets into two categories: easy (point mass, pick and place, and door) and hard (kitchen, metaworld-ml45, and meatworld-r3m). On the easy tasks we train for 100k gradient steps and on the hard tasks we train for 200k gradient steps. Batch size is 256 for all methods except explicit forward dynamics where (due to the added compute required for the decoder) we use batch size of 128 to even out computational requirements across methods. All methods are trained with the adamw optimizer with learning rate 1e-3, a cosine learning rate decay schedule, and default weight decay of 1e-4. |