Benchmarking Constraint Inference in Inverse Reinforcement Learning

Authors: Guiliang Liu, Yudong Luo, Ashish Gaurav, Kasra Rezaee, Pascal Poupart

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on these algorithms under our benchmark and show how they can facilitate studying important research challenges for ICRL.
Researcher Affiliation Collaboration Guiliang Liu1,2,3, Yudong Luo2,3, Ashish Gaurav2,3, Kasra Rezaee4, Pascal Poupart2,3 1The Chinese University of Hong Kong, Shenzhen, 2University of Waterloo, 3Vector Institute, 4Huawei
Pseudocode Yes Algorithm 1: Proximal Policy Optimization Lagrange (PPO-Lag)
Open Source Code Yes The benchmark, including the instructions for reproducing ICRL algorithms, is available at https://github.com/Guiliang/ICRL-benchmarks-public.
Open Datasets Yes This environment is constructed by utilizing the High D dataset (Krajewski et al., 2018).
Dataset Splits No No explicit information on dataset validation splits (e.g., percentages, sample counts for a validation set, or clear references to predefined validation splits) was found.
Hardware Specification Yes The cluster has multiple kinds of GPUs, including Tesla T4 with 16 GB memory, Tesla P100 with 12 GB memory, and RTX 6000 with 24 GB memory. We used machines with 12 GB of memory for training the ICRL models.
Software Dependencies No The paper mentions using Mu Jo Co (Todorov et al., 2012) and Common Road RL (Wang et al., 2021), and refers to a GitHub repository for configurations. However, it does not explicitly state specific version numbers for these or other software libraries/dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In the virtual environments, we set 1) the batch size of PPO-Lag to 64, 2) the size of the hidden layer to 64, and 3) the number of hidden layers for the policy function, the value function, and the cost function to 3. ... The random seeds of virtual environments are 123, 321, 456, 654, and 666.