Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Projection-Based Constrained Policy Optimization

Authors: Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, Peter J. Ramadge

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results over several control tasks demonstrate that PCPO achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.
Researcher Affiliation Collaboration Tsung-Yen Yang Princeton University EMAIL Justinian Rosca Siemens Corporation, Corporate Technology EMAIL Karthik Narasimhan Princeton University EMAIL Peter J. Ramadge Princeton University EMAIL
Pseudocode Yes Algorithm 1 Projection-Based Constrained Policy Optimization (PCPO)
Open Source Code Yes For code see the project website: https://sites.google.com/view/iclr2020-pcpo
Open Datasets Yes The first two tasks Gather and Circle are Mujoco environments with safety constraints introduced by Achiam et al. (2017) and two traffic management tasks with fairness constraints introduced by Vinitsky et al. (2018).
Dataset Splits No The paper mentions batch sizes and rollout lengths but does not explicitly provide training/validation/test dataset splits in the conventional sense for reproducibility.
Hardware Specification No No specific hardware (e.g., GPU models, CPU types, or cloud instance names) used for experiments is mentioned in the paper.
Software Dependencies No The experiments are implemented in rllab (Duan et al., 2016), a tool for developing and evaluating RL algorithms. No specific version numbers for rllab or other software dependencies are provided.
Experiment Setup Yes The hyperparameters of each task for all algorithms are as follows (PC: point circle, PG: point gather, AC: ant circle, AG: ant gather, Gr: grid, and BN: bottleneck tasks): Parameter PC PG AC AG Gr BN discount factor γ 0.995 0.995 0.995 0.995 0.999 0.999 step size δ 10 4 10 4 10 4 10 4 10 4 10 4 λGAE R 0.95 0.95 0.95 0.95 0.97 0.97 λGAE C 1.0 1.0 0.5 0.5 0.5 1.0 Batch size 50,000 50,000 100,000 100,000 10,000 25,000 Rollout length 50 15 500 500 400 500 Cost constraint threshold h 5 0.1 10 0.2 0 0