reproducibilityindex.ai

Deep Bayesian Quadrature Policy Optimization

Authors: Ravi Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Animashree Anandkumar, Yisong Yue6600-6608

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks. In comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient estimates with a signiﬁcantly lower variance, (ii) a consistent improvement in the sample complexity and average return for several deep policy gradient algorithms, and, (iii) the uncertainty in gradient estimation that can be incorporated to further improve the performance.
Researcher Affiliation	Collaboration	1Purdue University 2Google Research 3Caltech {rakella,kamyar}@purdue.edu, ghavamza@google.com, {anima,yyue}@caltech.edu
Pseudocode	Yes	Algorithm 1 BQ-PG Estimator Subroutine
Open Source Code	Yes	Code: https://github.com/Akella17/Deep-Bayesian-Quadrature Policy-Optimization
Open Datasets	Yes	We study the behaviour of BQ-PG methods (Algorithm 1) on Mu Jo Co environments, using the mujoco-py library of Open AI Gym (Brockman et al. 2016).
Dataset Splits	No	The paper describes training and evaluation on Mu Jo Co environments, but does not provide specific train/test/validation dataset splits (e.g., percentages or sample counts) as commonly seen in supervised learning contexts. Data is generated through interaction with the environment rather than being a static dataset with predefined splits.
Hardware Specification	No	The paper mentions 'GPU acceleration' and 'GPU hardware' in relation to the GPyTorch library, but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions software libraries like 'GPy Torch library' and 'mujoco-py library of Open AI Gym', but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiments.
Experiment Setup	No	The paper describes architectural choices like using 'generalized advantage estimates' and a 'deep RBF kernel' on top of a 'DNN feature extractor'. However, it does not provide specific hyperparameter values such as learning rates, batch sizes, number of epochs, or detailed optimizer settings in the main text.