Uncertainty-aware Constraint Inference in Inverse Constrained Reinforcement Learning

Authors: Sheng Xu, Guiliang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that UAICRL consistently outperforms other baselines in continuous and discrete environments with stochastic dynamics.
Researcher Affiliation Academia Sheng Xu School of Data Science The Chinese University of Hong Kong, Shenzhen shengxu1@link.cuhk.edu.cn; Guiliang Liu School of Data Science The Chinese University of Hong Kong, Shenzhen liuguiliang@cuhk.edu.cn
Pseudocode Yes Algorithm 1: Uncertainty-aware Inverse Constrained Reinforcement Learning (UAICRL); Algorithm 2: Distributional Lagrange Policy Optimization (DLPO); Algorithm 3: Flow-based Trajectory Generation (FTG)
Open Source Code Yes The code is available at https://github.com/Jasonxu1225/UAICRL.
Open Datasets Yes We conduct empirical evaluations utilizing an ICRL benchmark (Liu et al., 2023), and extend it to include stochastic dynamics by incorporating noise into transitions. [...] The five Mu Jo Co robotics environments are built upon Mu Jo Co (see Figure C.1). [...] We conduct experiments in a realistic highdimensional Highway Driving (High D) environment (Krajewski et al., 2018; Liu et al., 2023)...
Dataset Splits No The paper discusses training and testing performance, but does not explicitly provide details about specific train/validation/test dataset splits or their sizes. The term 'validation' is used in the context of 'validation of highly automated driving systems' when referencing the HighD dataset, not for data splitting.
Hardware Specification Yes For training the ICRL models, we utilized a total of 8 NVIDIA Ge Force RTX 3090 GPUs, each equipped with 24 GB of memory. The training process was conducted on a single running node, utilizing 8 CPUs per task.
Software Dependencies No The paper mentions using the Adam optimization algorithm and Mu Jo Co environments, but does not provide specific version numbers for these or other software dependencies (e.g., PyTorch, TensorFlow, specific MuJoCo version).
Experiment Setup Yes Table B.1: List of the utilized hyperparameters in UAICRL. To ensure equitable comparisons, we maintain consistency in the parameters of the same neural networks across different models. Parameters include Expert Rollouts, Max Length, Gamma, PPO Steps, Learning Rates for different networks, Quantiles, Risk Measure, and Risk Level.