reproducibilityindex.ai

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Authors: Firas Al-Hafez, Davide Tateo, Oleg Arenz, Guoping Zhao, Jan Peters

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on six Mu Jo Co environments: Ant-v3, Walker2d-v3, Hopper-v3, Half Cheetah-v3, Humanoid-v3, and Atlas. The latter is a novel locomotion environment introduced by us and is further described in Appendix C.1. We select the following baselines: GAIL (Ho & Ermon, 2016), VAIL (Peng et al., 2019), IQ-Learn (Garg et al., 2021) and SQIL (Reddy et al., 2020).
Researcher Affiliation	Academia	Firas Al-Hafez1, Davide Tateo1, Oleg Arenz1, Guoping Zhao2, Jan Peters1,3 1 Intelligent Autonomous Systems, 2 Locomotion Laboratory 3 German Research Center for AI (DFKI), Centre for Cognitive Science, Hessian.AI TU Darmstadt, Germany {name.surname}@tu-darmstadt.de
Pseudocode	Yes	Algorithm 1 LS-IQ
Open Source Code	Yes	1The code is available at https://github.com/robfiras/ls-iq
Open Datasets	Yes	We evaluate our method on six Mu Jo Co environments: Ant-v3, Walker2d-v3, Hopper-v3, Half Cheetah-v3, Humanoid-v3, and Atlas. ... The code for the environment as well as the expert data is available at https://github.com/ robfiras/ls-iq.
Dataset Splits	No	The paper states 'We use ten seeds and five expert trajectories for these experiments.' and mentions hyperparameter tuning ('we tune on each environment'), but it does not specify explicit validation dataset splits (e.g., percentages or counts) for hyperparameter selection or model evaluation in the main text.
Hardware Specification	Yes	Calculations for this research were conducted on the Lichtenberg high-performance computer of the TU Darmstadt.
Software Dependencies	No	For a fair comparison, all methods are implemented in the same framework, Mushroom RL (D Eramo et al., 2021). We verify that our implementations achieve comparable results to the original implementations by the authors. We use the hyperparameters proposed by the original authors for the respective environments and perform a grid search on novel environments.
Experiment Setup	Yes	We use the hyperparameters proposed by the original authors for the respective environments and perform a grid search on novel environments. ... For our method, we use the same hyperparameters as IQ-Learn, except for the regularizer coefficient c and the entropy coefficient β, which we tune on each environment. We only consider equal mixing, i.e., α = 0.5. ... We use ten seeds and five expert trajectories for these experiments.