reproducibilityindex.ai

Benchmarking Constraint Inference in Inverse Reinforcement Learning

Authors: Guiliang Liu, Yudong Luo, Ashish Gaurav, Kasra Rezaee, Pascal Poupart

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on these algorithms under our benchmark and show how they can facilitate studying important research challenges for ICRL.
Researcher Affiliation	Collaboration	Guiliang Liu1,2,3, Yudong Luo2,3, Ashish Gaurav2,3, Kasra Rezaee4, Pascal Poupart2,3 1The Chinese University of Hong Kong, Shenzhen, 2University of Waterloo, 3Vector Institute, 4Huawei
Pseudocode	Yes	Algorithm 1: Proximal Policy Optimization Lagrange (PPO-Lag)
Open Source Code	Yes	The benchmark, including the instructions for reproducing ICRL algorithms, is available at https://github.com/Guiliang/ICRL-benchmarks-public.
Open Datasets	Yes	This environment is constructed by utilizing the High D dataset (Krajewski et al., 2018).
Dataset Splits	No	No explicit information on dataset validation splits (e.g., percentages, sample counts for a validation set, or clear references to predefined validation splits) was found.
Hardware Specification	Yes	The cluster has multiple kinds of GPUs, including Tesla T4 with 16 GB memory, Tesla P100 with 12 GB memory, and RTX 6000 with 24 GB memory. We used machines with 12 GB of memory for training the ICRL models.
Software Dependencies	No	The paper mentions using Mu Jo Co (Todorov et al., 2012) and Common Road RL (Wang et al., 2021), and refers to a GitHub repository for configurations. However, it does not explicitly state specific version numbers for these or other software libraries/dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In the virtual environments, we set 1) the batch size of PPO-Lag to 64, 2) the size of the hidden layer to 64, and 3) the number of hidden layers for the policy function, the value function, and the cost function to 3. ... The random seeds of virtual environments are 123, 321, 456, 654, and 666.