Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning

Authors: Bo Yue, Shufan Wang, Ashish Gaurav, Jian Li, Pascal Poupart, Guiliang Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results across various environments validate our theoretical findings, underscoring the nuanced trade-offs between complexity reduction and generalizability in safety-critical applications. We empirically evaluate the ICRL solver against the IRC solver in four different constrained Gridworld environments.
Researcher Affiliation	Academia	1School of Data Science, The Chinese University of Hong Kong, Shenzhen, 2Stony Brook University, 3University of Waterloo, 4Vector Institute EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	We study a uniform sampling strategy, detailed in Appendix Algorithm 1. This strategy queries the generative model to sample the state-action space, enabling the estimation of the transition dynamics and the expert policy as b P = ( c M, bπE), where c M = (M\PT ) c PT .
Open Source Code	Yes	Code is available at https://github.com/Bobyue0118/Constraint-Inference-in-Safe-IRL.
Open Datasets	No	The paper uses
Dataset Splits	No	For continuous environments, we use the maximum entropy framework of ICRL (Malik et al., 2021) and simplified IRL framework for constraint inference (Hugessen et al., 2024) where two frameworks recover constraint knowledge that best explains the expert demonstrations from an offline dataset. This indicates an offline dataset is used, but no specific splits (e.g., train/test/validation percentages or counts) are provided.
Hardware Specification	Yes	We ran experiments on a desktop computer with Intel(R) Core(TM) i5-14400F and NVIDIA Ge Force RTX 4060 Ti.
Software Dependencies	No	The Blocked Half-Cheetah task is built on Mujoco, where the agent controls a two-legged robot. While Mujoco is mentioned, no specific version number is provided for it or any other software dependency.
Experiment Setup	Yes	Experiment Setting. We focus on evaluating the training efficiency and transferability of the ICRL and IRC solvers. The results are assessed using two key metrics:1) discounted cumulative rewards, which quantify the total rewards achieved by the learned policy. 2) discounted cumulative costs, which calculate the total costs incurred by the learned policy. We compare the uniform sampling strategy (Appendix Algorithm 1) of the ICRL and IRC solvers. Table 3: List of the utilized hyperparameters in the Gridworld environment. Table 4: List of the utilized hyperparameters in the Half-Cheetah environment.