reproducibilityindex.ai

First Order Constrained Optimization in Policy Space

Authors: Yiming Zhang, Quan Vuong, Keith Ross

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
Researcher Affiliation	Academia	Yiming Zhang New York University yiming.zhang@cs.nyu.edu Quan Vuong UC San Diego qvuong@ucsd.edu Keith W. Ross New York University Shanghai New York University keithwross@nyu.edu
Pseudocode	Yes	Algorithm 1 presents a summary of the FOCOPS algorithm. A more detailed pseudocode is provided in Appendix F of the supplementary materials.
Open Source Code	No	The paper does not provide an explicit statement or link to the source code for the FOCOPS method described in the paper. It mentions that another author (Joshua Achiam) made his implementation of the CPO algorithm publicly available.
Open Datasets	Yes	Both sets of experiments are implemented using the Open AI Gym API (Brockman et al., 2016) for the Mu Jo Co physical simulator (Todorov et al., 2012).
Dataset Splits	No	The paper discusses training and testing, but does not explicitly provide specific dataset split percentages, sample counts, or references to predefined splits for training, validation, and test sets. It mentions training on a 'fixed random seed' and testing on 'ten unseen random seeds' but not traditional data splits.
Hardware Specification	No	The paper acknowledges the 'NYU Shanghai High Performance Computing (HPC) administrator Zhiguo Qi and the HPC team at NYU' for technical support, but does not provide specific hardware details such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions 'Open AI Gym API (Brockman et al., 2016)' and 'Mu Jo Co physical simulator (Todorov et al., 2012)' but does not provide specific version numbers for these or any other software dependencies required for reproduction.
Experiment Setup	Yes	The hyperparameter νmax was selected via hyperparameter sweep on the set {1, 2, 3, 5, 10, + }. ... a ﬁxed λ found through hyperparameter sweeps provides good results. ... ν α(b JC(πθk)) where α is the step size. ... We estimate the advantage functions using the Generalized Advantage Estimator (GAE) (Schulman et al., 2016). ... During training, we use the early stopping criteria...