Dual Policy Iteration

Authors: Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Bagnell

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes. To evaluate our approach, we demonstrate our algorithm on discrete MDPs and continuous control tasks. We tested our approach on several MDPs: (1) a set of random discrete MDPs (Garnet problems [7]) (2) Cartpole balancing [31], (3) Helicopter Aerobatics (Hover and Funnel) [32], (4) Swimmer, Hopper and Half-Cheetah from the Mu Jo Co physics simulator [33].
Researcher Affiliation Collaboration 1School of Computer Science, Carnegie Mellon University, USA 2College of Computing, Georgia Institute of Technology, USA 3Aurora Innovation, USA
Pseudocode Yes Algorithm 1 AGGREVATED-GPS
Open Source Code No The paper does not provide concrete access to its own source code. It mentions 'Software available from rll.berkeley.edu/gps.' but this refers to a related work's implementation, not the authors' code for this paper.
Open Datasets Yes We tested our approach on several MDPs: (1) a set of random discrete MDPs (Garnet problems [7]) (2) Cartpole balancing [31], (3) Helicopter Aerobatics (Hover and Funnel) [32], (4) Swimmer, Hopper and Half-Cheetah from the Mu Jo Co physics simulator [33].
Dataset Splits No The paper mentions a training split for robust policy optimization ('We use 7 of the environments for training and the remaining three for testing.'), but does not specify a separate validation set for model tuning or general performance assessment across all experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper mentions using the 'Mu Jo Co physics simulator [33]' but does not provide specific version numbers for this or any other software, libraries, or programming languages used in their implementation.
Experiment Setup Yes The setup is detailed in Appendix B.4. The setup and the conservative update implementation is detailed in Appendix B.1.