Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

e-GAI: e-value-based Generalized $α$-Investing for Online False Discovery Rate Control

Authors: Yifan Zhang, Zijian Wei, Haojie Ren, Changliang Zou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Both simulated and real data experiments demonstrate the advantages of both e-LORD and e-SAFFRON in FDR control and power. In this section, we evaluate the performance of our online testing framework on both synthetic and real data. We compare e-LORD, e-SAFFRON, p L-RAI, and p S-RAI with e-LOND, LORD++, SAFFRON, and Sup LORD in terms of FDR and power.
Researcher Affiliation	Academia	1School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China 2School of Statistics and Data Sciences, LPMC, KLMDASR and LEBPS, Nankai University, Tianjin, China. Correspondence to: Haojie Ren <EMAIL>.
Pseudocode	Yes	Algorithm 1 e-LORD 1: Input: target FDR level α, initial allocation coefficient ω1 (0, 1), parameters φ and ψ (0, 1), sequence of e-values e1, e2, . . .. 2: Calculate α1 = αω1 and decide δ1 = 1 n e1 1 α1 3: Update R1 = δ1 and ω2 by (9); 4: for t = 2, 3, . . . do 5: Update testing level αt by (8); 6: Make decision δt = 1 n et 1 αt 7: Update Rt = Rt 1 + δt and ωt+1 by (9); 8: end for 9: Output: decision set {δ1, δ2, . . .}. Algorithm 2 e-SAFFRON 1: Input: target FDR level α, initial allocation coefficient ω1 (0, 1), parameters λ, φ and ψ (0, 1), sequence of e-values e1, e2, . . .. 2: Calculate α1 = α(1 λ)ω1 and decide δ1 = 1 n e1 1 α1 3: Update R1 = δ1 and ω2 by (9); 4: for t = 2, 3, . . . do 5: Update testing level αt by (10); 6: Make decision δt = 1 n et 1 αt 7: Update Rt = Rt 1 + δt and ωt+1 by (9); 8: end for 9: Output: decision set {δ1, δ2, . . .}.
Open Source Code	Yes	The code for all numerical experiments in this paper is available at https://github.com/zijianwei01/e-GAI.
Open Datasets	Yes	We analyze the NYC taxi dataset from the Numenta Anomaly Benchmark (NAB) repository (Lavin & Ahmad, 2015).
Dataset Splits	Yes	The first 2000 time points are taken as the initial sequence for model calibration. The calibration is implemented by using the first 1/3 observations, assuming the related period to be free of bubbles.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	Yes	We use default parameters from the R package online FDR (Robertson et al., 2022) for other benchmarks. R package 2.5.1.
Experiment Setup	Yes	We take ω1 = 0.005, φ = ψ = 0.5 in e-LORD and p L-RAI, and additionally λ = 0.1 in e-SAFFRON and p S-RAI, while we use default parameters from the R package online FDR (Robertson et al., 2022) for other benchmarks. The target FDR level is set as α = 0.05.