Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Augmented Bayesian Policy Search

Authors: Mahdi Kallel, Debabrota Basu, Riad Akrour, Carlo D'Eramo

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate ABS on high-dimensional locomotion problems and demonstrate competitive performance compared to existing direct policy search schemes. (Abstract); We provide empirical evidence on the effectiveness of our novel mean function, demonstrating that ABS surpasses previous BO schemes in high-dimensional Mu Jo CO locomotion problems (Todorov et al., 2012). (Introduction); In this experimental analysis, we aim to answer the following questions: (Section 5 header)
Researcher Affiliation Academia 1Center for Artificial Intelligence and Data Science, University of W urzburg, Germany 2Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRISt AL, Lille, France 3Department of Computer Science, TU Darmstadt, Germany 4Hessian Center for Artificial Intelligence (Hessian.ai), Germany
Pseudocode Yes Algorithm 1 Rollout for one point (Section 4); Algorithm 2 Augmented Bayesian Search (ABS) (Section 4)
Open Source Code Yes Our code is available at https://github.com/kallel-mahdi/abs.
Open Datasets Yes We validate our contributions using Mu Jo Co locomotion tasks (Todorov et al., 2012).
Dataset Splits No The paper describes the use of a "validation metric" (e.g., e R2 in Section 5.2) to evaluate the quality of Q-function estimators and aggregation schemes by using "samples collected in the last 3 outer loop steps of Algorithm 2". However, it does not specify a distinct, reproducible training/validation/test dataset split (e.g., specific percentages or fixed sample counts) for the overall model evaluation.
Hardware Specification Yes We run all of our experiments on a cluster of NVIDIA A100 40 GB GPU.
Software Dependencies Yes We have implemented ABS using Python 3.10 and Jax 0.4.20, we run all of our experiments on a cluster of NVIDIA A100 40 GB GPU.
Experiment Setup Yes We use a discount factor γ = 0.99 for all tasks, apart from Swimmer where it is γ = 0.995. (Section 5.1); For our ensemble of critics we use 5 distinct Dro Q networks. After querying for an acquisition point z, we perform 5000 gradient steps to the Q-function. (Section 5.1); Table 1: Hyperparameters used for grid-search for MPD and ABS. (Appendix A); Table 2: Hyperparameters used for ABS. (Appendix A)