Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

YEAST: Yet Another Sequential Test

Authors: Alexey Kurennoy, Majed Dodin, Tural Gurbanov, Ana Peleteiro Ramallo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method using semi-synthetic simulations and demonstrate that it outperforms current state-of-the-art sequential testing approaches. [...] We validate the proposed sequential test using a semi-synthetic simulation experiment based on a public real-world data set. We share the associated code for greater reproducibility [16]. [...] We conduct an empirical study and demonstrate that the proposed sequential test has better Type-I error control and higher power than current state-of-the-art methods for continuous experiment monitoring. [...] Section 4, we assess the correctness of the proposed method using a semi-synthetic experiment (based on a public real-world data set).
Researcher Affiliation	Industry	Alexey Kurennoy Meta EMAIL Majed Dodin Delivery Hero EMAIL Tural Gurbanov Sound Cloud EMAIL Ana Peleteiro Ramallo Preply EMAIL
Pseudocode	Yes	Algorithm 1 YEAST Require: N, ˆVN b = z1 α/2 p N ˆVN for n = 1, . . . , N do if Sn > b then flag significance end if end for
Open Source Code	Yes	We share the associated code [16] for greater reproducibility and to promote further research on this topic. [...] [16] Alexey Kurennoy. Yeast: Yet another sequential test. Git Repository, 2024. This paper s code companion. https://github.com/akurennoy/yeast. [...] The code implementing our experiments is openly available in the git repository[16].
Open Datasets	Yes	We validate the proposed sequential test using a semi-synthetic simulation experiment based on a public real-world data set. [...] The public dataset [5] consists of all transactions that occurred between 2010-12-01 and 2011-12-09 on a UK-based online store. [...] [5] Daqing Chen. Online Retail. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C5BW33.
Dataset Splits	Yes	We split it into two halves: the first 6 months were used for estimating parameters (N and VN) and the latter 6 months for validation. Using the validation period, we randomly assigned customers to control and treatment 100,000 times.
Hardware Specification	Yes	The simulations take about 15 min to run on a laptop with an Apple M3 Pro CPU and 18 GB RAM.
Software Dependencies	No	The paper mentions the use of the 'sandwich R package' and the 'vcov CL function' but does not provide specific version numbers for either R or the package. It cites papers related to the package, but these are not equivalent to explicit software versions.
Experiment Setup	Yes	We set the tuning parameter of the method to 11, 25, and 100 and denote the corresponding versions as m SRTphi11, m SRTphi25, and m SRTphi50. [...] As in [28], we set the numerator of parameter ρ of the method to 250, 500, and 750 and denote the corresponding instances of the method by GAVI250, GAVI500, and GAVI750, respectively. We also included GAVI with the tuning parameter ρ set to 10,000 (the default setting used by Eppo). [...] For computational reasons, we constructed the boundary using 100 interim checkpoints and then extended it in a piecewise-constant manner to support continuous monitoring. [...] In all experiments, the target significance level was set at 5% and the number of replications was set to 100,000.