Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Constrained Best Arm Identification

Authors: Tyron Lardy, Christina Katsimerou, Wouter M. Koolen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations demonstrate the performance of our algorithms. In this section, we put our algorithm, Ta S, to the test on the four bandits depicted in Figure 2.
Researcher Affiliation	Collaboration	Tyron Lardy CWI and Leiden University Christina Katsimerou Booking.com EMAIL Wouter M. Koolen CWI and University of Twente EMAIL
Pseudocode	No	The paper describes the 'Ta S' algorithm in Section 3 titled 'Asymptotically Optimal Algorithm', but it is presented in descriptive text rather than structured pseudocode blocks or a clearly labeled algorithm figure.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We aim to make the code publicly available at a later date.
Open Datasets	No	In this section, we put our algorithm, Ta S, to the test on the four bandits depicted in Figure 2. We treat both the unknown-covariance Gaussian model and the bounded model. For the latter, we clip the Gaussian arms from Figure 2 to [0, 1]2. - The paper creates simulated bandit instances rather than using pre-existing open datasets.
Dataset Splits	No	The paper uses simulated bandit instances and mentions "All instances were repeated 1000 times, except the hard one, which we ran 500 times," which describes simulation repetitions rather than training/test/validation dataset splits.
Hardware Specification	No	Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: This is not included, because the experiments were run on a personal laptop. We therefore do not expect anyone to run into problems in this regard.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers used for implementing the algorithms or running simulations.
Experiment Setup	Yes	All algorithms use the same GLR rule and the stylized stopping threshold log(1/δ) + log log(t), originally used by Garivier and Kaufmann [2016] and heavily adopted in the literature for allowing shorter runtimes while keeping the errors lower than δ. As initialization, we start by pulling each arm 3 times, which is the minimum required for the covariance matrix estimation. We work in the moderate confidence regime of δ = 0.01.