Offline Imitation Learning with a Misspecified Simulator
Authors: Shengyi Jiang, Jingcheng Pang, Yang Yu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted in four Mu Jo Co locomotion environments with modified friction, gravity, and density configurations. Experiment results show that HIDIL achieves significant improvement in terms of performance and stability in all of the real environments, compared with imitation learning methods and transferring methods in reinforcement learning. |
| Researcher Affiliation | Academia | Shengyi Jiang, Jing-Cheng Pang, Yang Yu State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China {jiangsy, pangjc, yuy}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Deployment Process of HIDIL and Algorithm 2 Training Procedure of HIDIL |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. It notes that the baseline method [5] does not have open-source code, implying a similar situation for their own method. |
| Open Datasets | Yes | We evaluate the efficacy of our algorithm with four continuous-control locomotion tasks from Mu Jo Co [13]. (Citation [13]: Emanuel Todorov, Tom Erez, and Yuval Tassa. Mu Jo Co: A physics engine for model-based control. In Proceedings of 25th International Conference on Intelligent Robots and Systems, pages 5026 5033, 2012.) |
| Dataset Splits | No | The paper mentions collecting '10 expert demonstration trajectories' and sampling '4M timesteps in Esim' but does not provide specific train/validation/test dataset splits (percentages, counts, or citations to predefined splits) to reproduce data partitioning. |
| Hardware Specification | No | The paper mentions using the 'Mu Jo Co physics simulator' but does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components such as 'Mu Jo Co', 'Open AI Gym', and 'Proximal Policy Optimization (PPO)' but does not provide specific version numbers for these or any other ancillary software dependencies, which are necessary for replication. |
| Experiment Setup | Yes | The horizon H in HIDIL is set to 5 and T is set to 1 by default across all tasks and dynamics configurations. Every method trained in the simulator samples 4M timesteps in Esim. The methods that require supervised training, i.e. HID, are all trained until convergence. |