Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Simplifying Constraint Inference with Inverse Reinforcement Learning
Authors: Adriana Hugessen, Harley Wiltzer, Glen Berseth
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct a series of experiments across several environments in order to answer the following questions: (1) How does IRL perform on constraint inference tasks compared to Lagrangian methods? and (2) How do the proposed modifications or regularizations over vanilla IRL improve performance on constraint inference tasks? |
| Researcher Affiliation | Academia | Adriana Hugessen Mila, Université de Montréal Harley Wiltzer Mila, Mc Gill University Glen Berseth Mila, Université de Montréal |
| Pseudocode | Yes | Algorithm 1 IRL for ICRL Separate Critics |
| Open Source Code | Yes | Our code is made available at https://github.com/ahugs/simple-icrl. |
| Open Datasets | Yes | For our experiments, we consider the virtual environments for benchmarking inverse constraint learning, introduced by Liu et al. [2023a] since these were specially designed to test the performance of constraint inference tasks and also provide a recent baseline for Lagrangian-based constraint inference methods, including expert data. |
| Dataset Splits | No | We compare all methods according to average performance in the last 50 testing episodes and report statistics (IQM, Median, Mean and Optimality Gap) with bootstrapped 95% confidence intervals computed across five seeds according to the method recommended in Agarwal et al. [2021]. |
| Hardware Specification | No | Each run (consisting of five seeds) was trained on a node with a single GPU (varying GPU resources were used), 6 CPUs and 6GB of RAM per CPU. |
| Software Dependencies | Yes | All of our code is based on the Tianshou [Weng et al., 2022] and FSRL [Liu et al., 2023b] implementations of SAC and SAC-Lagrangian, respectively. Here we include all the hyperparameter configurations for our experiments. Any hyperparameters not listed here use the default hyperparameters in their respective libraries (Tianshou version 1.0.0 and FSRL version 0.1.0). |
| Experiment Setup | Yes | All of our code is based on the Tianshou [Weng et al., 2022] and FSRL [Liu et al., 2023b] implementations of SAC and SAC-Lagrangian, respectively. Here we include all the hyperparameter configurations for our experiments. Any hyperparameters not listed here use the default hyperparameters in their respective libraries (Tianshou version 1.0.0 and FSRL version 0.1.0). |