Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning

Authors: Bo Yue, Shufan Wang, Ashish Gaurav, Jian Li, Pascal Poupart, Guiliang Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results across various environments validate our theoretical findings, underscoring the nuanced trade-offs between complexity reduction and generalizability in safety-critical applications. We empirically evaluate the ICRL solver against the IRC solver in four different constrained Gridworld environments.
Researcher Affiliation Academia 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, 2Stony Brook University, 3University of Waterloo, 4Vector Institute EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes We study a uniform sampling strategy, detailed in Appendix Algorithm 1. This strategy queries the generative model to sample the state-action space, enabling the estimation of the transition dynamics and the expert policy as b P = ( c M, b΀E), where c M = (M\PT ) c PT .
Open Source Code Yes Code is available at https://github.com/Bobyue0118/Constraint-Inference-in-Safe-IRL.
Open Datasets No The paper uses
Dataset Splits No For continuous environments, we use the maximum entropy framework of ICRL (Malik et al., 2021) and simplified IRL framework for constraint inference (Hugessen et al., 2024) where two frameworks recover constraint knowledge that best explains the expert demonstrations from an offline dataset. This indicates an offline dataset is used, but no specific splits (e.g., train/test/validation percentages or counts) are provided.
Hardware Specification Yes We ran experiments on a desktop computer with Intel(R) Core(TM) i5-14400F and NVIDIA Ge Force RTX 4060 Ti.
Software Dependencies No The Blocked Half-Cheetah task is built on Mujoco, where the agent controls a two-legged robot. While Mujoco is mentioned, no specific version number is provided for it or any other software dependency.
Experiment Setup Yes Experiment Setting. We focus on evaluating the training efficiency and transferability of the ICRL and IRC solvers. The results are assessed using two key metrics:1) discounted cumulative rewards, which quantify the total rewards achieved by the learned policy. 2) discounted cumulative costs, which calculate the total costs incurred by the learned policy. We compare the uniform sampling strategy (Appendix Algorithm 1) of the ICRL and IRC solvers. Table 3: List of the utilized hyperparameters in the Gridworld environment. Table 4: List of the utilized hyperparameters in the Half-Cheetah environment.