reproducibilityindex.ai

Learning Shared Safety Constraints from Multi-task Demonstrations

Authors: Konwoo Kim, Gokul Swamy, ZUXIN LIU, DING ZHAO, Sanjiban Choudhury, Steven Z. Wu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method with simulation experiments on high-dimensional continuous control tasks.
Researcher Affiliation	Academia	Konwoo Kim Carnegie Mellon University Gokul Swamy Carnegie Mellon University Zuxin Liu Carnegie Mellon University Ding Zhao Carnegie Mellon University Sanjiban Choudhury Cornell University Zhiwei Steven Wu Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 CRL (Constrained Reinforcement Learning) Input: Reward r, constraint c, learning rates η1:N, tolerance δ Output: Trained policy π Initialize λ1 = 0 for i in 1 . . . N do πi RL(r = r λic) λi [λi + ηi(J(πi, c) δ)]+ end for Return Unif(π1:N).
Open Source Code	Yes	We release the code we used for all of our experiments at https://github.com/konwook/mticl.
Open Datasets	Yes	For our multi-task experiments, we build upon the D4RL [Fu et al., 2020] Ant Maze benchmark.
Dataset Splits	No	The paper mentions 'Return best of c1:N on validation data' in Algorithm 2 and 3, indicating the use of validation data. However, it does not specify the size, proportion, or method of creating the validation split (e.g., percentages or exact counts) from the datasets used.
Hardware Specification	Yes	We used a single NVIDIA 3090 GPU for all experiments.
Software Dependencies	No	The paper mentions using 'Tianshou' and 'PPO' implementations, and environments like 'PyBullet' and 'MuJoCo', citing their respective papers. However, it does not provide specific version numbers for these software components (e.g., Tianshou version, PyBullet version), which are necessary for full reproducibility.
Experiment Setup	Yes	The key experimental hyperparameters are shown in Table 1. The exact configuration we use for our experiments is available at https://github.com/konwook/mticl/blob/main/mticl/utils/ config.py. Table 1 lists specific values for parameters such as 'PPO Learning Rate 0.0003', 'PPO Batch Size 512', 'Steps per Epoch 20000', and various 'ICL Epochs' and 'Cost Limit' values.