Confidence Aware Inverse Constrained Reinforcement Learning

Authors: Sriram Ganapathi Subramanian, Guiliang Liu, Mohammed Elmahgiubi, Kasra Rezaee, Pascal Poupart

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All experiments are repeated 50 times and we report the average and standard deviation of performances. Further, we conduct an unpaired 2-sided t-test and report p-values for statistical significance. As is common in literature, we will consider p < 0.05 as statistically significant differences. All experiments are conducted in two phases.
Researcher Affiliation Collaboration 1Vector Institute for Artificial Intelligence, Toronto, Canada 2School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 3Huawei Technologies Canada 4Cheriton School of Computer Science, University of Waterloo, Canada.
Pseudocode Yes The steps in confidence aware ICRL are given in Algorithm 1.
Open Source Code Yes All the code for the experiments have been open-sourced (Subramanian, 2024).
Open Datasets Yes The first is a set of virtual environments from the well-known Mu Jo Co (Todorov et al., 2012) simulator. The second is a realistic environment based on a highway driving task previously used by Liu et al. (2023). We consider a total of seven domains for the experiments (five within Mujoco and two on the highway driving task).
Dataset Splits No The paper mentions training and testing phases but does not explicitly provide details about training/validation/test dataset splits, percentages, or absolute counts.
Hardware Specification Yes All the training for the experiments were conducted on a virtual machine having 2 Nvidia A100 GPUs with a GPU memory of 40 GB. The CPUs use the AMD EPYC processors with a memory of 125 GB.
Software Dependencies No The paper mentions algorithms like PPO-Lagrange and model architectures like transformers but does not list specific software dependencies with version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes For the virutal environments, PPO-Lag batch size is 64, hidden layer size is 64 and the number of hidden layers for policy, value and cost networks is 3. For the High D driving environments, the batch size of the constraint model is 1000, the hidden layer size is 64 and the number of hidden layers for policy, value and cost networks is 3. Regarding CA-ICRL, we use transformers to implement the encoder blocks given in Figure 4. CA-ICRL contains 2 heads with 4 hidden layers. The value for β = 0.02 throughout.