Offline Imitation Learning with a Misspecified Simulator

Authors: Shengyi Jiang, Jingcheng Pang, Yang Yu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted in four Mu Jo Co locomotion environments with modified friction, gravity, and density configurations. Experiment results show that HIDIL achieves significant improvement in terms of performance and stability in all of the real environments, compared with imitation learning methods and transferring methods in reinforcement learning.
Researcher Affiliation Academia Shengyi Jiang, Jing-Cheng Pang, Yang Yu State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China {jiangsy, pangjc, yuy}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1 Deployment Process of HIDIL and Algorithm 2 Training Procedure of HIDIL
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the methodology described. It notes that the baseline method [5] does not have open-source code, implying a similar situation for their own method.
Open Datasets Yes We evaluate the efficacy of our algorithm with four continuous-control locomotion tasks from Mu Jo Co [13]. (Citation [13]: Emanuel Todorov, Tom Erez, and Yuval Tassa. Mu Jo Co: A physics engine for model-based control. In Proceedings of 25th International Conference on Intelligent Robots and Systems, pages 5026 5033, 2012.)
Dataset Splits No The paper mentions collecting '10 expert demonstration trajectories' and sampling '4M timesteps in Esim' but does not provide specific train/validation/test dataset splits (percentages, counts, or citations to predefined splits) to reproduce data partitioning.
Hardware Specification No The paper mentions using the 'Mu Jo Co physics simulator' but does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software components such as 'Mu Jo Co', 'Open AI Gym', and 'Proximal Policy Optimization (PPO)' but does not provide specific version numbers for these or any other ancillary software dependencies, which are necessary for replication.
Experiment Setup Yes The horizon H in HIDIL is set to 5 and T is set to 1 by default across all tasks and dynamics configurations. Every method trained in the simulator samples 4M timesteps in Esim. The methods that require supervised training, i.e. HID, are all trained until convergence.