Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
Authors: Andreas Schlaginhaufen, Maryam Kamgarpour
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, we experimentally verify our results in a gridworld environment. |
| Researcher Affiliation | Academia | 1SYCAMORE Lab, Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland. Correspondence to: Andreas Schlaginhaufen <andreas.schlaginhaufen@epfl.ch>. |
| Pseudocode | Yes | Algorithm 1 Gradient Descent Ascent for Constrained Entropy-Regularized IRL |
| Open Source Code | Yes | The code to all our experiments is available at: https://github.com/andrschl/cirl |
| Open Datasets | Yes | We consider a gridworld environment (Sutton & Barto, 2018) |
| Dataset Splits | No | The paper mentions varying the number of expert trajectories (N) and trajectory length (T) but does not specify a training/validation/test split for the dataset itself. |
| Hardware Specification | No | The paper describes the simulated environment but does not provide any specific hardware details such as GPU/CPU models or memory used for running the experiments. |
| Software Dependencies | Yes | feasibility is checked via the LP solver linprog provided by (Virtanen et al., 2020). |
| Experiment Setup | Yes | We consider a gridworld environment... with 36 states (the grid cells) and 4 actions (up, down, left, right). The agent has a 90% chance of reaching the desired location when taking an action and a 10% chance of ending up in a random neighboring grid cell. We choose the entropy regularization f(µ) = E(s,a) µ [H (πµ( |s))]. We use a primal-dual gradient-descent-ascent method... with N {10, 100, 1000, 10000} trajectories of length T = 10000. |