Augmented Bayesian Policy Search

Authors: Mahdi Kallel, Debabrota Basu, Riad Akrour, Carlo D'Eramo

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate ABS on high-dimensional locomotion problems and demonstrate competitive performance compared to existing direct policy search schemes. (Abstract); We provide empirical evidence on the effectiveness of our novel mean function, demonstrating that ABS surpasses previous BO schemes in high-dimensional Mu Jo CO locomotion problems (Todorov et al., 2012). (Introduction); In this experimental analysis, we aim to answer the following questions: (Section 5 header)
Researcher Affiliation Academia 1Center for Artificial Intelligence and Data Science, University of W urzburg, Germany 2Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRISt AL, Lille, France 3Department of Computer Science, TU Darmstadt, Germany 4Hessian Center for Artificial Intelligence (Hessian.ai), Germany
Pseudocode Yes Algorithm 1 Rollout for one point (Section 4); Algorithm 2 Augmented Bayesian Search (ABS) (Section 4)
Open Source Code Yes Our code is available at https://github.com/kallel-mahdi/abs.
Open Datasets Yes We validate our contributions using Mu Jo Co locomotion tasks (Todorov et al., 2012).
Dataset Splits No The paper describes the use of a "validation metric" (e.g., e R2 in Section 5.2) to evaluate the quality of Q-function estimators and aggregation schemes by using "samples collected in the last 3 outer loop steps of Algorithm 2". However, it does not specify a distinct, reproducible training/validation/test dataset split (e.g., specific percentages or fixed sample counts) for the overall model evaluation.
Hardware Specification Yes We run all of our experiments on a cluster of NVIDIA A100 40 GB GPU.
Software Dependencies Yes We have implemented ABS using Python 3.10 and Jax 0.4.20, we run all of our experiments on a cluster of NVIDIA A100 40 GB GPU.
Experiment Setup Yes We use a discount factor γ = 0.99 for all tasks, apart from Swimmer where it is γ = 0.995. (Section 5.1); For our ensemble of critics we use 5 distinct Dro Q networks. After querying for an acquisition point z, we perform 5000 gradient steps to the Q-function. (Section 5.1); Table 1: Hyperparameters used for grid-search for MPD and ABS. (Appendix A); Table 2: Hyperparameters used for ABS. (Appendix A)