Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
YEAST: Yet Another Sequential Test
Authors: Alexey Kurennoy, Majed Dodin, Tural Gurbanov, Ana Peleteiro Ramallo
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method using semi-synthetic simulations and demonstrate that it outperforms current state-of-the-art sequential testing approaches. [...] We validate the proposed sequential test using a semi-synthetic simulation experiment based on a public real-world data set. We share the associated code for greater reproducibility [16]. [...] We conduct an empirical study and demonstrate that the proposed sequential test has better Type-I error control and higher power than current state-of-the-art methods for continuous experiment monitoring. [...] Section 4, we assess the correctness of the proposed method using a semi-synthetic experiment (based on a public real-world data set). |
| Researcher Affiliation | Industry | Alexey Kurennoy Meta EMAIL Majed Dodin Delivery Hero EMAIL Tural Gurbanov Sound Cloud EMAIL Ana Peleteiro Ramallo Preply EMAIL |
| Pseudocode | Yes | Algorithm 1 YEAST Require: N, ËVN b = z1 α/2 p N ËVN for n = 1, . . . , N do if Sn > b then flag significance end if end for |
| Open Source Code | Yes | We share the associated code [16] for greater reproducibility and to promote further research on this topic. [...] [16] Alexey Kurennoy. Yeast: Yet another sequential test. Git Repository, 2024. This paper s code companion. https://github.com/akurennoy/yeast. [...] The code implementing our experiments is openly available in the git repository[16]. |
| Open Datasets | Yes | We validate the proposed sequential test using a semi-synthetic simulation experiment based on a public real-world data set. [...] The public dataset [5] consists of all transactions that occurred between 2010-12-01 and 2011-12-09 on a UK-based online store. [...] [5] Daqing Chen. Online Retail. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C5BW33. |
| Dataset Splits | Yes | We split it into two halves: the first 6 months were used for estimating parameters (N and VN) and the latter 6 months for validation. Using the validation period, we randomly assigned customers to control and treatment 100,000 times. |
| Hardware Specification | Yes | The simulations take about 15 min to run on a laptop with an Apple M3 Pro CPU and 18 GB RAM. |
| Software Dependencies | No | The paper mentions the use of the 'sandwich R package' and the 'vcov CL function' but does not provide specific version numbers for either R or the package. It cites papers related to the package, but these are not equivalent to explicit software versions. |
| Experiment Setup | Yes | We set the tuning parameter of the method to 11, 25, and 100 and denote the corresponding versions as m SRTphi11, m SRTphi25, and m SRTphi50. [...] As in [28], we set the numerator of parameter Ï of the method to 250, 500, and 750 and denote the corresponding instances of the method by GAVI250, GAVI500, and GAVI750, respectively. We also included GAVI with the tuning parameter Ï set to 10,000 (the default setting used by Eppo). [...] For computational reasons, we constructed the boundary using 100 interim checkpoints and then extended it in a piecewise-constant manner to support continuous monitoring. [...] In all experiments, the target significance level was set at 5% and the number of replications was set to 100,000. |