Benchmarking Constraint Inference in Inverse Reinforcement Learning
Authors: Guiliang Liu, Yudong Luo, Ashish Gaurav, Kasra Rezaee, Pascal Poupart
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on these algorithms under our benchmark and show how they can facilitate studying important research challenges for ICRL. |
| Researcher Affiliation | Collaboration | Guiliang Liu1,2,3, Yudong Luo2,3, Ashish Gaurav2,3, Kasra Rezaee4, Pascal Poupart2,3 1The Chinese University of Hong Kong, Shenzhen, 2University of Waterloo, 3Vector Institute, 4Huawei |
| Pseudocode | Yes | Algorithm 1: Proximal Policy Optimization Lagrange (PPO-Lag) |
| Open Source Code | Yes | The benchmark, including the instructions for reproducing ICRL algorithms, is available at https://github.com/Guiliang/ICRL-benchmarks-public. |
| Open Datasets | Yes | This environment is constructed by utilizing the High D dataset (Krajewski et al., 2018). |
| Dataset Splits | No | No explicit information on dataset validation splits (e.g., percentages, sample counts for a validation set, or clear references to predefined validation splits) was found. |
| Hardware Specification | Yes | The cluster has multiple kinds of GPUs, including Tesla T4 with 16 GB memory, Tesla P100 with 12 GB memory, and RTX 6000 with 24 GB memory. We used machines with 12 GB of memory for training the ICRL models. |
| Software Dependencies | No | The paper mentions using Mu Jo Co (Todorov et al., 2012) and Common Road RL (Wang et al., 2021), and refers to a GitHub repository for configurations. However, it does not explicitly state specific version numbers for these or other software libraries/dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In the virtual environments, we set 1) the batch size of PPO-Lag to 64, 2) the size of the hidden layer to 64, and 3) the number of hidden layers for the policy function, the value function, and the cost function to 3. ... The random seeds of virtual environments are 123, 321, 456, 654, and 666. |