Learning Shared Safety Constraints from Multi-task Demonstrations
Authors: Konwoo Kim, Gokul Swamy, ZUXIN LIU, DING ZHAO, Sanjiban Choudhury, Steven Z. Wu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method with simulation experiments on high-dimensional continuous control tasks. |
| Researcher Affiliation | Academia | Konwoo Kim Carnegie Mellon University Gokul Swamy Carnegie Mellon University Zuxin Liu Carnegie Mellon University Ding Zhao Carnegie Mellon University Sanjiban Choudhury Cornell University Zhiwei Steven Wu Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 CRL (Constrained Reinforcement Learning) Input: Reward r, constraint c, learning rates η1:N, tolerance δ Output: Trained policy π Initialize λ1 = 0 for i in 1 . . . N do πi RL(r = r λic) λi [λi + ηi(J(πi, c) δ)]+ end for Return Unif(π1:N). |
| Open Source Code | Yes | We release the code we used for all of our experiments at https://github.com/konwook/mticl. |
| Open Datasets | Yes | For our multi-task experiments, we build upon the D4RL [Fu et al., 2020] Ant Maze benchmark. |
| Dataset Splits | No | The paper mentions 'Return best of c1:N on validation data' in Algorithm 2 and 3, indicating the use of validation data. However, it does not specify the size, proportion, or method of creating the validation split (e.g., percentages or exact counts) from the datasets used. |
| Hardware Specification | Yes | We used a single NVIDIA 3090 GPU for all experiments. |
| Software Dependencies | No | The paper mentions using 'Tianshou' and 'PPO' implementations, and environments like 'PyBullet' and 'MuJoCo', citing their respective papers. However, it does not provide specific version numbers for these software components (e.g., Tianshou version, PyBullet version), which are necessary for full reproducibility. |
| Experiment Setup | Yes | The key experimental hyperparameters are shown in Table 1. The exact configuration we use for our experiments is available at https://github.com/konwook/mticl/blob/main/mticl/utils/ config.py. Table 1 lists specific values for parameters such as 'PPO Learning Rate 0.0003', 'PPO Batch Size 512', 'Steps per Epoch 20000', and various 'ICL Epochs' and 'Cost Limit' values. |