Robust Imitation of a Few Demonstrations with a Backwards Model

Authors: Jung Yeon Park, Lawson Wong

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On continuous control domains, we evaluate the robustness when starting from different initial states unseen in the demonstration data. While both our method and other imitation learning baselines can successfully solve the tasks for initial states in the training distribution, our method exhibits considerably more robustness to different initial states.
Researcher Affiliation Academia Jung Yeon Park Khoury College of Computer Sciences Northeastern University Boston, MA, USA park.jungy@northeastern.edu Lawson L.S. Wong Khoury College of Computer Sciences Northeastern University Boston, MA, USA lsw@ccs.neu.edu
Pseudocode Yes Algorithm 1 Backwards Model-based Imitation Learning (BMIL)
Open Source Code Yes Our code for the modified environments, generating expert policies, and running all experiments are available at https://github.com/jypark0/bmil.
Open Datasets No The paper describes generating demonstrations using expert policies trained by the authors or pretrained policies from other works, but does not provide concrete access (link, DOI, or specific citation with author/year for a public dataset) to these demonstration datasets themselves.
Dataset Splits No The paper does not explicitly define or use train/validation/test splits for the demonstration data. It describes training on demonstrations and model rollouts and then evaluates robustness by varying initial states, but no distinct validation set for hyperparameter tuning or early stopping is mentioned.
Hardware Specification No The paper mentions using the "Discovery cluster, supported by Northeastern University s Research Computing team" in the acknowledgments, but does not specify any particular CPU, GPU models, or memory details used for the experiments.
Software Dependencies No The paper mentions using the "Mu Jo Co simulator" and "neural networks" implemented with "ReLU activations" and "MLPs", but it does not specify exact version numbers for any software, libraries, or dependencies used in the experiments.
Experiment Setup Yes To train the policy, we use pd = 0.5 for the Fetch environments, pd = 0.8 0.95 for the Maze environments, and pd = 0.8 for the Adroit environments. For the policy, we use neural networks with 3 fully connected hidden layers with 256 neurons and Re LU activations. For the backwards model, we use 4-layer MLPs with 256 hidden units for both the action predictor BA and previous state predictor BS and use diagonal Gaussian distributions.