reproducibilityindex.ai

Offline Imitation Learning with a Misspecified Simulator

Authors: Shengyi Jiang, Jingcheng Pang, Yang Yu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are conducted in four Mu Jo Co locomotion environments with modiﬁed friction, gravity, and density conﬁgurations. Experiment results show that HIDIL achieves signiﬁcant improvement in terms of performance and stability in all of the real environments, compared with imitation learning methods and transferring methods in reinforcement learning.
Researcher Affiliation	Academia	Shengyi Jiang, Jing-Cheng Pang, Yang Yu State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China {jiangsy, pangjc, yuy}@lamda.nju.edu.cn
Pseudocode	Yes	Algorithm 1 Deployment Process of HIDIL and Algorithm 2 Training Procedure of HIDIL
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the methodology described. It notes that the baseline method [5] does not have open-source code, implying a similar situation for their own method.
Open Datasets	Yes	We evaluate the efﬁcacy of our algorithm with four continuous-control locomotion tasks from Mu Jo Co [13]. (Citation [13]: Emanuel Todorov, Tom Erez, and Yuval Tassa. Mu Jo Co: A physics engine for model-based control. In Proceedings of 25th International Conference on Intelligent Robots and Systems, pages 5026 5033, 2012.)
Dataset Splits	No	The paper mentions collecting '10 expert demonstration trajectories' and sampling '4M timesteps in Esim' but does not provide specific train/validation/test dataset splits (percentages, counts, or citations to predefined splits) to reproduce data partitioning.
Hardware Specification	No	The paper mentions using the 'Mu Jo Co physics simulator' but does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software components such as 'Mu Jo Co', 'Open AI Gym', and 'Proximal Policy Optimization (PPO)' but does not provide specific version numbers for these or any other ancillary software dependencies, which are necessary for replication.
Experiment Setup	Yes	The horizon H in HIDIL is set to 5 and T is set to 1 by default across all tasks and dynamics conﬁgurations. Every method trained in the simulator samples 4M timesteps in Esim. The methods that require supervised training, i.e. HID, are all trained until convergence.