First Order Constrained Optimization in Policy Space
Authors: Yiming Zhang, Quan Vuong, Keith Ross
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks. |
| Researcher Affiliation | Academia | Yiming Zhang New York University yiming.zhang@cs.nyu.edu Quan Vuong UC San Diego qvuong@ucsd.edu Keith W. Ross New York University Shanghai New York University keithwross@nyu.edu |
| Pseudocode | Yes | Algorithm 1 presents a summary of the FOCOPS algorithm. A more detailed pseudocode is provided in Appendix F of the supplementary materials. |
| Open Source Code | No | The paper does not provide an explicit statement or link to the source code for the FOCOPS method described in the paper. It mentions that another author (Joshua Achiam) made his implementation of the CPO algorithm publicly available. |
| Open Datasets | Yes | Both sets of experiments are implemented using the Open AI Gym API (Brockman et al., 2016) for the Mu Jo Co physical simulator (Todorov et al., 2012). |
| Dataset Splits | No | The paper discusses training and testing, but does not explicitly provide specific dataset split percentages, sample counts, or references to predefined splits for training, validation, and test sets. It mentions training on a 'fixed random seed' and testing on 'ten unseen random seeds' but not traditional data splits. |
| Hardware Specification | No | The paper acknowledges the 'NYU Shanghai High Performance Computing (HPC) administrator Zhiguo Qi and the HPC team at NYU' for technical support, but does not provide specific hardware details such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'Open AI Gym API (Brockman et al., 2016)' and 'Mu Jo Co physical simulator (Todorov et al., 2012)' but does not provide specific version numbers for these or any other software dependencies required for reproduction. |
| Experiment Setup | Yes | The hyperparameter νmax was selected via hyperparameter sweep on the set {1, 2, 3, 5, 10, + }. ... a fixed λ found through hyperparameter sweeps provides good results. ... ν α(b JC(πθk)) where α is the step size. ... We estimate the advantage functions using the Generalized Advantage Estimator (GAE) (Schulman et al., 2016). ... During training, we use the early stopping criteria... |