reproducibilityindex.ai

Robust Imitation via Mirror Descent Inverse Reinforcement Learning

Authors: Dong-Sig Han, Hyunseo Kim, Hyundo Lee, JeHwan Ryu, Byoung-Tak Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks.
Researcher Affiliation	Academia	Artiﬁcial Intelligence Institute, Seoul National University {dshan, hskim, hdlee, jhryu, btzhang}@bi.snu.ac.kr
Pseudocode	Yes	Algorithm 1 Mirror Descent Adversarial Inverse Reinforcement Learning.
Open Source Code	No	Our empirical studies can be reproduced by from the detailed information in Appendices B and C. (This statement refers to 'detailed information' for reproduction, not explicitly to open-source code being provided via a link or in supplementary materials itself. Without checking the appendices, the main text does not provide concrete access to the source code.)
Open Datasets	Yes	Mu Jo Co [19] benchmarks and The Mu Jo Co simulator used in our experiments is freely available to everyone. See the site (https://mujoco.org).
Dataset Splits	No	The paper discusses training with expert demonstrations and different numbers of episodes but does not provide explicit details on train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	Yes	In experiments, each algorithm was executed in CPU (a single thread).
Software Dependencies	No	The paper mentions software like RAC, SAC, and TensorFlow, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Input: trajectories {τ t }T t=1, an agent πθ, a reference policy πν, a neural network dξ:S R, a regularized reward function ψφ ΨΩ(Π), α1,αT , and λ. and Fig. 5 shows that the Bregman divergence was large for MD-AIRL at the early training phase, because we chose the initial step size η1 to be greater than 1 (α1 = 0.5). and MD-AIRL outperformed RAIRL in four cases by choosing an effectively low step size at ηT to be less than 1 (αT = 2).