Learning Shared Safety Constraints from Multi-task Demonstrations

Authors: Konwoo Kim, Gokul Swamy, ZUXIN LIU, DING ZHAO, Sanjiban Choudhury, Steven Z. Wu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method with simulation experiments on high-dimensional continuous control tasks.
Researcher Affiliation Academia Konwoo Kim Carnegie Mellon University Gokul Swamy Carnegie Mellon University Zuxin Liu Carnegie Mellon University Ding Zhao Carnegie Mellon University Sanjiban Choudhury Cornell University Zhiwei Steven Wu Carnegie Mellon University
Pseudocode Yes Algorithm 1 CRL (Constrained Reinforcement Learning) Input: Reward r, constraint c, learning rates η1:N, tolerance δ Output: Trained policy π Initialize λ1 = 0 for i in 1 . . . N do πi RL(r = r λic) λi [λi + ηi(J(πi, c) δ)]+ end for Return Unif(π1:N).
Open Source Code Yes We release the code we used for all of our experiments at https://github.com/konwook/mticl.
Open Datasets Yes For our multi-task experiments, we build upon the D4RL [Fu et al., 2020] Ant Maze benchmark.
Dataset Splits No The paper mentions 'Return best of c1:N on validation data' in Algorithm 2 and 3, indicating the use of validation data. However, it does not specify the size, proportion, or method of creating the validation split (e.g., percentages or exact counts) from the datasets used.
Hardware Specification Yes We used a single NVIDIA 3090 GPU for all experiments.
Software Dependencies No The paper mentions using 'Tianshou' and 'PPO' implementations, and environments like 'PyBullet' and 'MuJoCo', citing their respective papers. However, it does not provide specific version numbers for these software components (e.g., Tianshou version, PyBullet version), which are necessary for full reproducibility.
Experiment Setup Yes The key experimental hyperparameters are shown in Table 1. The exact configuration we use for our experiments is available at https://github.com/konwook/mticl/blob/main/mticl/utils/ config.py. Table 1 lists specific values for parameters such as 'PPO Learning Rate 0.0003', 'PPO Batch Size 512', 'Steps per Epoch 20000', and various 'ICL Epochs' and 'Cost Limit' values.