reproducibilityindex.ai

Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation

Authors: David Brandfonbrener, Ofir Nachum, Joan Bruna

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evidence of this claim through evaluations on a variety of simulated visuomotor manipulation problems. First, we conduct an extensive empirical evaluation and introspection of the candidate algorithms along with several strong baselines.
Researcher Affiliation	Collaboration	David Brandfonbrener New York University Ofir Nachum Google Joan Bruna New York University
Pseudocode	No	The paper describes the algorithms using mathematical equations but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Full details about the methodology are in Appendix C and code is at https://github.com/davidbrandfonbrener/imitation_pretraining.
Open Datasets	Yes	Our suite consists of six different pretraining datasets on varied tasks and of varied size. The datasets are described in full detail in Appendix C. All of our environments are based on the Mu Jo Co simulator [Todorov et al., 2012]. The point mass environment is derived from the DM control suite [Tunyasuvunakool etulators, 2020]. The kitchen environment and dataset was introduced in Gupta et al. [2019]. The rest of the environments are taken from Metaworld [Yu et al., 2020].
Dataset Splits	Yes	First there is a large multi-context pretraining dataset that will be used for representation learning, specifically to learn an observation encoder. Second, there is a small single-context finetuning dataset for policy learning on top of the pretrained representation. Results presented in Figure 9 show the average across datasets of the cross-representation prediction error on a validation set from the pretraining distribution (normalized by the mean prediction error on each dataset).
Hardware Specification	Yes	Pretraining was all done on an internal cluster using RTX8000 GPUs. Finetuning and evaluation was all done on an internal cluster on CPU (since the finetuned policy network is small and environments run on CPU).
Software Dependencies	No	The paper mentions software like JAX, flax, optax, Mu Jo Co, and DM control suite, but does not provide specific version numbers for these software dependencies, only citations to their papers.
Experiment Setup	Yes	Training hyperparameters. For pretraining, we split the datasets into two categories: easy (point mass, pick and place, and door) and hard (kitchen, metaworld-ml45, and meatworld-r3m). On the easy tasks we train for 100k gradient steps and on the hard tasks we train for 200k gradient steps. Batch size is 256 for all methods except explicit forward dynamics where (due to the added compute required for the decoder) we use batch size of 128 to even out computational requirements across methods. All methods are trained with the adamw optimizer with learning rate 1e-3, a cosine learning rate decay schedule, and default weight decay of 1e-4.