Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sampling by averaging: A multiscale approach to score estimation

Authors: Paula Cordero-Encinar, Andrew Duncan, Sebastian Reich, O. Deniz Akyildiz

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results across synthetic and real-world benchmarks, including multimodal and high-dimensional distributions, demonstrate that the proposed methods are competitive with existing samplers in terms of accuracy and efficiency, without the need for learned models.
Researcher Affiliation Academia Paula Cordero-Encinar Department of Mathematics Imperial College London EMAIL Andrew B. Duncan Department of Mathematics Imperial College London EMAIL Sebastian Reich Institut für Mathematik Universität Potsdam EMAIL O. Deniz Akyildiz Department of Mathematics Imperial College London EMAIL
Pseudocode Yes Algorithm 1 MULTALMC sampler: accelerated version Algorithm 2 MULTALMC sampler: overdamped version Algorithm 3 MULTALMC sampler: accelerated version Algorithm 4 MULTCDIFF sampler
Open Source Code Yes The code to reproduce our experiments is available at https://github.com/paulaoak/ sampling_by_averaging.git.
Open Datasets Yes Examples from Bayesian statistics: Posterior distributions arising from Bayesian logistic regression tasks on the Ionosphere (dimension 35) and Sonar (dimension 61) datasets.
Dataset Splits No In the Bayesian logistic regression tasks, performance is measured via the mean predictive loglikelihood, computed as p(w, b|Dtest), where Dtest is a held-out test dataset not used during training. (No explicit split percentages or method for creating the split are provided for any dataset.)
Hardware Specification Yes All experiments were conducted on a GPU server consisting of eight Nvidia Ge Force RTX 3090 Ti GPU cards, 896 GB of memory and 14TB of local on-server data storage. Each GPU has 10496 cores as well as 24 GB of memory.
Software Dependencies No All experiments were implemented using JAX. (No version numbers provided for JAX or any other libraries.)
Experiment Setup Yes D.3 Algorithms and hyperparameters For each baseline algorithm, we perform a grid search over a predefined set of hyperparameter values. Selection is based on the corresponding performance metric, computed using 4096 samples. The selected hyperparameters for each algorithm are summarised below. ... For MULTALMC, the required hyperparameters include the schedule function λt, the values of λδ and λ, the mass matrix M, ε, the friction coefficient Γ, the step size h, and the number of SROCK steps s. ... (Tables 4, 5, 6, 8 detail selected hyperparameters for various algorithms and experiments).