reproducibilityindex.ai

Projection-Based Constrained Policy Optimization

Authors: Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, Peter J. Ramadge

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results over several control tasks demonstrate that PCPO achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.
Researcher Affiliation	Collaboration	Tsung-Yen Yang Princeton University ty3@princeton.edu Justinian Rosca Siemens Corporation, Corporate Technology justinian.rosca@siemens.com Karthik Narasimhan Princeton University karthikn@princeton.edu Peter J. Ramadge Princeton University ramadge@princeton.edu
Pseudocode	Yes	Algorithm 1 Projection-Based Constrained Policy Optimization (PCPO)
Open Source Code	Yes	For code see the project website: https://sites.google.com/view/iclr2020-pcpo
Open Datasets	Yes	The first two tasks Gather and Circle are Mujoco environments with safety constraints introduced by Achiam et al. (2017) and two traffic management tasks with fairness constraints introduced by Vinitsky et al. (2018).
Dataset Splits	No	The paper mentions batch sizes and rollout lengths but does not explicitly provide training/validation/test dataset splits in the conventional sense for reproducibility.
Hardware Specification	No	No specific hardware (e.g., GPU models, CPU types, or cloud instance names) used for experiments is mentioned in the paper.
Software Dependencies	No	The experiments are implemented in rllab (Duan et al., 2016), a tool for developing and evaluating RL algorithms. No specific version numbers for rllab or other software dependencies are provided.
Experiment Setup	Yes	The hyperparameters of each task for all algorithms are as follows (PC: point circle, PG: point gather, AC: ant circle, AG: ant gather, Gr: grid, and BN: bottleneck tasks): Parameter PC PG AC AG Gr BN discount factor γ 0.995 0.995 0.995 0.995 0.999 0.999 step size δ 10 4 10 4 10 4 10 4 10 4 10 4 λGAE R 0.95 0.95 0.95 0.95 0.97 0.97 λGAE C 1.0 1.0 0.5 0.5 0.5 1.0 Batch size 50,000 50,000 100,000 100,000 10,000 25,000 Rollout length 50 15 500 500 400 500 Cost constraint threshold h 5 0.1 10 0.2 0 0