reproducibilityindex.ai

Confidence Aware Inverse Constrained Reinforcement Learning

Authors: Sriram Ganapathi Subramanian, Guiliang Liu, Mohammed Elmahgiubi, Kasra Rezaee, Pascal Poupart

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	All experiments are repeated 50 times and we report the average and standard deviation of performances. Further, we conduct an unpaired 2-sided t-test and report p-values for statistical significance. As is common in literature, we will consider p < 0.05 as statistically significant differences. All experiments are conducted in two phases.
Researcher Affiliation	Collaboration	1Vector Institute for Artificial Intelligence, Toronto, Canada 2School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 3Huawei Technologies Canada 4Cheriton School of Computer Science, University of Waterloo, Canada.
Pseudocode	Yes	The steps in confidence aware ICRL are given in Algorithm 1.
Open Source Code	Yes	All the code for the experiments have been open-sourced (Subramanian, 2024).
Open Datasets	Yes	The first is a set of virtual environments from the well-known Mu Jo Co (Todorov et al., 2012) simulator. The second is a realistic environment based on a highway driving task previously used by Liu et al. (2023). We consider a total of seven domains for the experiments (five within Mujoco and two on the highway driving task).
Dataset Splits	No	The paper mentions training and testing phases but does not explicitly provide details about training/validation/test dataset splits, percentages, or absolute counts.
Hardware Specification	Yes	All the training for the experiments were conducted on a virtual machine having 2 Nvidia A100 GPUs with a GPU memory of 40 GB. The CPUs use the AMD EPYC processors with a memory of 125 GB.
Software Dependencies	No	The paper mentions algorithms like PPO-Lagrange and model architectures like transformers but does not list specific software dependencies with version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	For the virutal environments, PPO-Lag batch size is 64, hidden layer size is 64 and the number of hidden layers for policy, value and cost networks is 3. For the High D driving environments, the batch size of the constraint model is 1000, the hidden layer size is 64 and the number of hidden layers for policy, value and cost networks is 3. Regarding CA-ICRL, we use transformers to implement the encoder blocks given in Figure 4. CA-ICRL contains 2 heads with 4 hidden layers. The value for β = 0.02 throughout.