reproducibilityindex.ai

Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

Authors: Andreas Schlaginhaufen, Maryam Kamgarpour

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate our results in a gridworld environment (Section 5).
Researcher Affiliation	Academia	Andreas Schlaginhaufen SYCAMORE, EPFL andreas.schlaginhaufen@epfl.ch Maryam Kamgarpour SYCAMORE, EPFL maryam.kamgarpour@epfl.ch
Pseudocode	Yes	Algorithm 1: Multi-expert IRL
Open Source Code	Yes	1The code is openly accessible at https://github.com/andrschl/transfer_irl.
Open Datasets	Yes	To validate our results experimentally, we adopt a stochastic variant of the Windy Gridworld environment [Sutton and Barto, 2018].
Dataset Splits	No	The paper does not explicitly mention a dedicated validation dataset split, only "expert data sets" for learning.
Hardware Specification	Yes	All our experiments were carried out within a day on a Mac Book Pro with an Apple M1 Pro chip and 32 GB of RAM.
Software Dependencies	No	The paper mentions general software components like "soft policy iteration" but does not specify version numbers for any libraries or dependencies.
Experiment Setup	Yes	Using Shannon entropy regularization with τ = 0.3, we then use soft policy iteration to get expert policies for each combination of expert reward and wind strength β. For each of these expert policies, we then generate expert data sets with N E {103, 104, 105, 106} trajectories of length H = 100. Next, we run Algorithm 1, with soft policy iteration as a subroutine, for 30 000 iterations, where rewards are initialized by sampling from a standard normal distribution. As a reward class, we choose the 1-ball with radius 103 (essentially unbounded), as a stepsize α = 0.05 for the first 15 000 iterations and α = 0.005 for the second half. Moreover, we sample N = 100 new trajectories of horizon H = 100 at each gradient step.