First Order Constrained Optimization in Policy Space

Authors: Yiming Zhang, Quan Vuong, Keith Ross

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
Researcher Affiliation Academia Yiming Zhang New York University yiming.zhang@cs.nyu.edu Quan Vuong UC San Diego qvuong@ucsd.edu Keith W. Ross New York University Shanghai New York University keithwross@nyu.edu
Pseudocode Yes Algorithm 1 presents a summary of the FOCOPS algorithm. A more detailed pseudocode is provided in Appendix F of the supplementary materials.
Open Source Code No The paper does not provide an explicit statement or link to the source code for the FOCOPS method described in the paper. It mentions that another author (Joshua Achiam) made his implementation of the CPO algorithm publicly available.
Open Datasets Yes Both sets of experiments are implemented using the Open AI Gym API (Brockman et al., 2016) for the Mu Jo Co physical simulator (Todorov et al., 2012).
Dataset Splits No The paper discusses training and testing, but does not explicitly provide specific dataset split percentages, sample counts, or references to predefined splits for training, validation, and test sets. It mentions training on a 'fixed random seed' and testing on 'ten unseen random seeds' but not traditional data splits.
Hardware Specification No The paper acknowledges the 'NYU Shanghai High Performance Computing (HPC) administrator Zhiguo Qi and the HPC team at NYU' for technical support, but does not provide specific hardware details such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions 'Open AI Gym API (Brockman et al., 2016)' and 'Mu Jo Co physical simulator (Todorov et al., 2012)' but does not provide specific version numbers for these or any other software dependencies required for reproduction.
Experiment Setup Yes The hyperparameter νmax was selected via hyperparameter sweep on the set {1, 2, 3, 5, 10, + }. ... a fixed λ found through hyperparameter sweeps provides good results. ... ν α(b JC(πθk)) where α is the step size. ... We estimate the advantage functions using the Generalized Advantage Estimator (GAE) (Schulman et al., 2016). ... During training, we use the early stopping criteria...