Learning Soft Constraints From Constrained Expert Demonstrations

Authors: Ashish Gaurav, Kasra Rezaee, Guiliang Liu, Pascal Poupart

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our approach on synthetic environments, robotics environments and real world highway driving scenarios. ... We conduct several experiments on the following environments: (a) Gridworld (A, B), which are 7x7 gridworld environments, (b) Cart Pole (MR or Move Right, Mid) which are variants of the Cart Pole environment from Open AI Gym (Brockman et al., 2016), (c) Highway driving environment based on the High D dataset (Krajewski et al., 2018), (d) Mujoco robotics environments (Ant-Constrained, Half Cheetah-Constrained), and (e) Highway lane change environment based on the Exi D dataset (Moers et al., 2022). ... We define two metrics for our experiments: (a) Constraint Mean Squared Error (CMSE), which is the mean squared error between the true constraint and the recovered constraint, and (b) Normalized Accrual Dissimilarity (NAD)... Our results are reported in Tables 1, 2, 5 and 6.
Researcher Affiliation Collaboration Ashish Gaurav1,2, Kasra Rezaee3, Guiliang Liu4, Pascal Poupart1,2 ... 1Cheriton School of Computer Science, University of Waterloo, Canada 2Vector Institute, Toronto, Canada 3Huawei Technologies Canada 4School of Data Science, The Chinese University of Hong Kong, Shenzhen
Pseudocode Yes Algorithm 1 INVERSE-CONSTRAINT-LEARNING ... Algorithm 2 CONSTRAINED-RL ... Algorithm 3 CONSTRAINT-ADJUSTMENT
Open Source Code No The paper mentions adapting environments from external repositories (e.g., 'yrlu s repository (Lu, 2019)') but does not provide a statement or link for the source code of their proposed method.
Open Datasets Yes We conduct several experiments on the following environments: (a) Gridworld (A, B), which are 7x7 gridworld environments, (b) Cart Pole (MR or Move Right, Mid) which are variants of the Cart Pole environment from Open AI Gym (Brockman et al., 2016), (c) Highway driving environment based on the High D dataset (Krajewski et al., 2018), (d) Mujoco robotics environments (Ant-Constrained, Half Cheetah-Constrained), and (e) Highway lane change environment based on the Exi D dataset (Moers et al., 2022).
Dataset Splits No The paper does not explicitly specify validation dataset splits (e.g., percentages or counts for training, validation, and test sets). While evaluation on datasets is performed, the specific partitioning for validation purposes is not detailed.
Hardware Specification No The paper mentions 'CPU/GPU load' when discussing training times ('the training times may depend on CPU/GPU load'), but it does not specify any particular CPU or GPU models or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions several software components and libraries, such as 'Open AI Gym (Brockman et al., 2016)', 'PPO algorithm (Schulman et al., 2017)', and 'Python Optimal Transport library (Flamary et al., 2021)'. However, it does not provide specific version numbers for these dependencies.
Experiment Setup Yes The hyperparameter configuration (e.g., choice of λ), training strategy and training time statistics are elaborated in Appendix C. ... C.4 COMMON HYPERPARAMETERS ... C.5 HYPERPARAMETERS FOR GAIL-CONSTRAINT ... C.6 HYPERPARAMETERS FOR ICRL ... C.7 HYPERPARAMETERS FOR ICL (OUR METHOD). Tables 7, 8, 9, 10, 11 provide specific values for learning rates, hidden layer sizes, discount factors, PPO steps per epoch, constraint update epochs, and soft loss coefficients.