Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Authors: Andreas Schlaginhaufen, Maryam Kamgarpour

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, we experimentally verify our results in a gridworld environment.
Researcher Affiliation Academia 1SYCAMORE Lab, Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland. Correspondence to: Andreas Schlaginhaufen <andreas.schlaginhaufen@epfl.ch>.
Pseudocode Yes Algorithm 1 Gradient Descent Ascent for Constrained Entropy-Regularized IRL
Open Source Code Yes The code to all our experiments is available at: https://github.com/andrschl/cirl
Open Datasets Yes We consider a gridworld environment (Sutton & Barto, 2018)
Dataset Splits No The paper mentions varying the number of expert trajectories (N) and trajectory length (T) but does not specify a training/validation/test split for the dataset itself.
Hardware Specification No The paper describes the simulated environment but does not provide any specific hardware details such as GPU/CPU models or memory used for running the experiments.
Software Dependencies Yes feasibility is checked via the LP solver linprog provided by (Virtanen et al., 2020).
Experiment Setup Yes We consider a gridworld environment... with 36 states (the grid cells) and 4 actions (up, down, left, right). The agent has a 90% chance of reaching the desired location when taking an action and a 10% chance of ending up in a random neighboring grid cell. We choose the entropy regularization f(µ) = E(s,a) µ [H (πµ( |s))]. We use a primal-dual gradient-descent-ascent method... with N {10, 100, 1000, 10000} trajectories of length T = 10000.