Robust Imitation via Mirror Descent Inverse Reinforcement Learning

Authors: Dong-Sig Han, Hyunseo Kim, Hyundo Lee, JeHwan Ryu, Byoung-Tak Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks.
Researcher Affiliation Academia Artificial Intelligence Institute, Seoul National University {dshan, hskim, hdlee, jhryu, btzhang}@bi.snu.ac.kr
Pseudocode Yes Algorithm 1 Mirror Descent Adversarial Inverse Reinforcement Learning.
Open Source Code No Our empirical studies can be reproduced by from the detailed information in Appendices B and C. (This statement refers to 'detailed information' for reproduction, not explicitly to open-source code being provided via a link or in supplementary materials itself. Without checking the appendices, the main text does not provide concrete access to the source code.)
Open Datasets Yes Mu Jo Co [19] benchmarks and The Mu Jo Co simulator used in our experiments is freely available to everyone. See the site (https://mujoco.org).
Dataset Splits No The paper discusses training with expert demonstrations and different numbers of episodes but does not provide explicit details on train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes In experiments, each algorithm was executed in CPU (a single thread).
Software Dependencies No The paper mentions software like RAC, SAC, and TensorFlow, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Input: trajectories {τ t }T t=1, an agent πθ, a reference policy πν, a neural network dξ:S R, a regularized reward function ψφ ΨΩ(Π), α1,αT , and λ. and Fig. 5 shows that the Bregman divergence was large for MD-AIRL at the early training phase, because we chose the initial step size η1 to be greater than 1 (α1 = 0.5). and MD-AIRL outperformed RAIRL in four cases by choosing an effectively low step size at ηT to be less than 1 (αT = 2).