Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
Authors: Andreas Schlaginhaufen, Maryam Kamgarpour
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, we experimentally verify our results in a gridworld environment. |
| Researcher Affiliation | Academia | 1SYCAMORE Lab, Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland. Correspondence to: Andreas Schlaginhaufen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Gradient Descent Ascent for Constrained Entropy-Regularized IRL |
| Open Source Code | Yes | The code to all our experiments is available at: https://github.com/andrschl/cirl |
| Open Datasets | Yes | We consider a gridworld environment (Sutton & Barto, 2018) |
| Dataset Splits | No | The paper mentions varying the number of expert trajectories (N) and trajectory length (T) but does not specify a training/validation/test split for the dataset itself. |
| Hardware Specification | No | The paper describes the simulated environment but does not provide any specific hardware details such as GPU/CPU models or memory used for running the experiments. |
| Software Dependencies | Yes | feasibility is checked via the LP solver linprog provided by (Virtanen et al., 2020). |
| Experiment Setup | Yes | We consider a gridworld environment... with 36 states (the grid cells) and 4 actions (up, down, left, right). The agent has a 90% chance of reaching the desired location when taking an action and a 10% chance of ending up in a random neighboring grid cell. We choose the entropy regularization f(ยต) = E(s,a) ยต [H (ฯยต( |s))]. We use a primal-dual gradient-descent-ascent method... with N {10, 100, 1000, 10000} trajectories of length T = 10000. |