Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Authors: Andreas Schlaginhaufen, Maryam Kamgarpour

ICML 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, we experimentally verify our results in a gridworld environment.
Researcher Affiliation Academia 1SYCAMORE Lab, Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland. Correspondence to: Andreas Schlaginhaufen <EMAIL>.
Pseudocode Yes Algorithm 1 Gradient Descent Ascent for Constrained Entropy-Regularized IRL
Open Source Code Yes The code to all our experiments is available at: https://github.com/andrschl/cirl
Open Datasets Yes We consider a gridworld environment (Sutton & Barto, 2018)
Dataset Splits No The paper mentions varying the number of expert trajectories (N) and trajectory length (T) but does not specify a training/validation/test split for the dataset itself.
Hardware Specification No The paper describes the simulated environment but does not provide any specific hardware details such as GPU/CPU models or memory used for running the experiments.
Software Dependencies Yes feasibility is checked via the LP solver linprog provided by (Virtanen et al., 2020).
Experiment Setup Yes We consider a gridworld environment... with 36 states (the grid cells) and 4 actions (up, down, left, right). The agent has a 90% chance of reaching the desired location when taking an action and a 10% chance of ending up in a random neighboring grid cell. We choose the entropy regularization f(ยต) = E(s,a) ยต [H (ฯ€ยต( |s))]. We use a primal-dual gradient-descent-ascent method... with N {10, 100, 1000, 10000} trajectories of length T = 10000.