Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Martingale Posterior Neural Networks for Fast Sequential Decision Making

Authors: Gerardo Duran-Martin, Leandro Sánchez-Betancourt, Alvaro Cartea, Kevin Murphy

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, they achieve competitive performance speed trade-offs in non-stationary contextual bandits and Bayesian optimization, offering 10 100 times faster inference than classical Thompson sampling while maintaining comparable or superior decision performance.
Researcher Affiliation Collaboration 1 Oxford-Man Institute of Quantitative Finance 2 Mathematical Institute, University of Oxford 3Google Deepmind
Pseudocode Yes Algorithm 1 Predictive sampling for sequential decision making in contextual bandits.
Open Source Code Yes Our code is available at https:// github.com/gerdm/martingale-posterior-neural-networks.
Open Datasets Yes We consider the MNIST contextual bandit task introduced by [49]... We study the performance of the methods on the Kuairec dataset of [22]... We consider seven benchmark functions commonly used in BO [5]: Ackley (2D, 5D, 10D, 50D), Branin (2D), Hartmann (6D), and Draw NN (50D and 200D).
Dataset Splits No We take the data D1:T with T = 60,000 and Dt = (xt, yt), where xt is a (28 28 1) array and yt {0, 1}10 a one-hotencoded vector such that (yt)i = 1 if xt represents the digit i and 0 otherwise. At every timestep t = 1, . . . , T each agent is presented the image xt which it has to classify. A predicted classification is made through the prediction yt|t 1 = f(µt|t 1, xt) with µt|t 1 = E[θt | D1:t 1] and then updates its beliefs given the (true) reward yt.
Hardware Specification Yes All experiments were run on a TPU v4-8.
Software Dependencies No For Drawnn functions, we use projected gradient descent, implemented with the Jaxopt library [4], to optimize the sampled function directly over the continuous domain [0, 1].
Experiment Setup Yes Unless otherwise stated, we train the parameters of the neural network with the Adam W optimization algorithm [44]. ... For Adamw and muon (used in ε-greedy and in conjunction with LLL), we use a learning rate of 10 4, and ε = 0.05 For Adamw, we take 5 inner iterations and a buffer size of one and for muon, we take 1 inner iteration and a buffer size of one. Next, Hi Lo Fi considers a rank of 50 for the hidden parameters, we take qh,t = 10 6 and qℓ= 10 6. The initial covariances Σh,0 and Σℓ,0 are both initialized as identity times a factor of 10 1.