Dual Policy Iteration
Authors: Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Bagnell
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes. To evaluate our approach, we demonstrate our algorithm on discrete MDPs and continuous control tasks. We tested our approach on several MDPs: (1) a set of random discrete MDPs (Garnet problems [7]) (2) Cartpole balancing [31], (3) Helicopter Aerobatics (Hover and Funnel) [32], (4) Swimmer, Hopper and Half-Cheetah from the Mu Jo Co physics simulator [33]. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, Carnegie Mellon University, USA 2College of Computing, Georgia Institute of Technology, USA 3Aurora Innovation, USA |
| Pseudocode | Yes | Algorithm 1 AGGREVATED-GPS |
| Open Source Code | No | The paper does not provide concrete access to its own source code. It mentions 'Software available from rll.berkeley.edu/gps.' but this refers to a related work's implementation, not the authors' code for this paper. |
| Open Datasets | Yes | We tested our approach on several MDPs: (1) a set of random discrete MDPs (Garnet problems [7]) (2) Cartpole balancing [31], (3) Helicopter Aerobatics (Hover and Funnel) [32], (4) Swimmer, Hopper and Half-Cheetah from the Mu Jo Co physics simulator [33]. |
| Dataset Splits | No | The paper mentions a training split for robust policy optimization ('We use 7 of the environments for training and the remaining three for testing.'), but does not specify a separate validation set for model tuning or general performance assessment across all experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the 'Mu Jo Co physics simulator [33]' but does not provide specific version numbers for this or any other software, libraries, or programming languages used in their implementation. |
| Experiment Setup | Yes | The setup is detailed in Appendix B.4. The setup and the conservative update implementation is detailed in Appendix B.1. |