Deep Bayesian Quadrature Policy Optimization
Authors: Ravi Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Animashree Anandkumar, Yisong Yue6600-6608
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks. In comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient estimates with a significantly lower variance, (ii) a consistent improvement in the sample complexity and average return for several deep policy gradient algorithms, and, (iii) the uncertainty in gradient estimation that can be incorporated to further improve the performance. |
| Researcher Affiliation | Collaboration | 1Purdue University 2Google Research 3Caltech {rakella,kamyar}@purdue.edu, ghavamza@google.com, {anima,yyue}@caltech.edu |
| Pseudocode | Yes | Algorithm 1 BQ-PG Estimator Subroutine |
| Open Source Code | Yes | Code: https://github.com/Akella17/Deep-Bayesian-Quadrature Policy-Optimization |
| Open Datasets | Yes | We study the behaviour of BQ-PG methods (Algorithm 1) on Mu Jo Co environments, using the mujoco-py library of Open AI Gym (Brockman et al. 2016). |
| Dataset Splits | No | The paper describes training and evaluation on Mu Jo Co environments, but does not provide specific train/test/validation dataset splits (e.g., percentages or sample counts) as commonly seen in supervised learning contexts. Data is generated through interaction with the environment rather than being a static dataset with predefined splits. |
| Hardware Specification | No | The paper mentions 'GPU acceleration' and 'GPU hardware' in relation to the GPyTorch library, but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software libraries like 'GPy Torch library' and 'mujoco-py library of Open AI Gym', but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiments. |
| Experiment Setup | No | The paper describes architectural choices like using 'generalized advantage estimates' and a 'deep RBF kernel' on top of a 'DNN feature extractor'. However, it does not provide specific hyperparameter values such as learning rates, batch sizes, number of epochs, or detailed optimizer settings in the main text. |