reproducibilityindex.ai

Constrained Policy Optimization

Authors: Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety. In our experiments, we show that CPO can train neural network policies with thousands of parameters on highdimensional simulated robot locomotion tasks to maximize rewards while successfully enforcing constraints.
Researcher Affiliation	Collaboration	1UC Berkeley 2Open AI.
Pseudocode	Yes	Algorithm 1 Constrained Policy Optimization
Open Source Code	Yes	We give the pseudocode for our algorithm (for the single-constraint case) as Algorithm 1, and have made our code implementation available online.1 https://github.com/jachiam/cpo
Open Datasets	No	The paper uses 'simulated robot locomotion tasks' and environments like 'Point-Circle', 'Ant-Circle', 'Humanoid-Circle', 'Point-Gather', 'Ant-Gather' which are described as custom simulation environments, not standard public datasets with access information provided.
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (e.g., percentages, sample counts, or explicit references to predefined splits) for training, validation, or testing.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	Our experiments are implemented in rllab (Duan et al., 2016). This mentions a framework but does not provide specific version numbers for other ancillary software or libraries needed for replication.
Experiment Setup	No	For all experiments, we use neural network policies with two hidden layers of size (64, 32). This is a model architecture detail, but the paper does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or comprehensive system-level training configurations.