reproducibilityindex.ai

Uncertainty-aware Constraint Inference in Inverse Constrained Reinforcement Learning

Authors: Sheng Xu, Guiliang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that UAICRL consistently outperforms other baselines in continuous and discrete environments with stochastic dynamics.
Researcher Affiliation	Academia	Sheng Xu School of Data Science The Chinese University of Hong Kong, Shenzhen shengxu1@link.cuhk.edu.cn; Guiliang Liu School of Data Science The Chinese University of Hong Kong, Shenzhen liuguiliang@cuhk.edu.cn
Pseudocode	Yes	Algorithm 1: Uncertainty-aware Inverse Constrained Reinforcement Learning (UAICRL); Algorithm 2: Distributional Lagrange Policy Optimization (DLPO); Algorithm 3: Flow-based Trajectory Generation (FTG)
Open Source Code	Yes	The code is available at https://github.com/Jasonxu1225/UAICRL.
Open Datasets	Yes	We conduct empirical evaluations utilizing an ICRL benchmark (Liu et al., 2023), and extend it to include stochastic dynamics by incorporating noise into transitions. [...] The five Mu Jo Co robotics environments are built upon Mu Jo Co (see Figure C.1). [...] We conduct experiments in a realistic highdimensional Highway Driving (High D) environment (Krajewski et al., 2018; Liu et al., 2023)...
Dataset Splits	No	The paper discusses training and testing performance, but does not explicitly provide details about specific train/validation/test dataset splits or their sizes. The term 'validation' is used in the context of 'validation of highly automated driving systems' when referencing the HighD dataset, not for data splitting.
Hardware Specification	Yes	For training the ICRL models, we utilized a total of 8 NVIDIA Ge Force RTX 3090 GPUs, each equipped with 24 GB of memory. The training process was conducted on a single running node, utilizing 8 CPUs per task.
Software Dependencies	No	The paper mentions using the Adam optimization algorithm and Mu Jo Co environments, but does not provide specific version numbers for these or other software dependencies (e.g., PyTorch, TensorFlow, specific MuJoCo version).
Experiment Setup	Yes	Table B.1: List of the utilized hyperparameters in UAICRL. To ensure equitable comparisons, we maintain consistency in the parameters of the same neural networks across different models. Parameters include Expert Rollouts, Max Length, Gamma, PPO Steps, Learning Rates for different networks, Quantiles, Risk Measure, and Risk Level.