Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Authors: Dong-Sig Han, Hyunseo Kim, Hyundo Lee, JeHwan Ryu, Byoung-Tak Zhang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks. |
| Researcher Affiliation | Academia | Artificial Intelligence Institute, Seoul National University EMAIL |
| Pseudocode | Yes | Algorithm 1 Mirror Descent Adversarial Inverse Reinforcement Learning. |
| Open Source Code | No | Our empirical studies can be reproduced by from the detailed information in Appendices B and C. (This statement refers to 'detailed information' for reproduction, not explicitly to open-source code being provided via a link or in supplementary materials itself. Without checking the appendices, the main text does not provide concrete access to the source code.) |
| Open Datasets | Yes | Mu Jo Co [19] benchmarks and The Mu Jo Co simulator used in our experiments is freely available to everyone. See the site (https://mujoco.org). |
| Dataset Splits | No | The paper discusses training with expert demonstrations and different numbers of episodes but does not provide explicit details on train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | In experiments, each algorithm was executed in CPU (a single thread). |
| Software Dependencies | No | The paper mentions software like RAC, SAC, and TensorFlow, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Input: trajectories {τ t }T t=1, an agent πθ, a reference policy πν, a neural network dξ:S R, a regularized reward function ψφ ΨΩ(Π), α1,αT , and λ. and Fig. 5 shows that the Bregman divergence was large for MD-AIRL at the early training phase, because we chose the initial step size η1 to be greater than 1 (α1 = 0.5). and MD-AIRL outperformed RAIRL in four cases by choosing an effectively low step size at ηT to be less than 1 (αT = 2). |