Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
e-GAI: e-value-based Generalized $α$-Investing for Online False Discovery Rate Control
Authors: Yifan Zhang, Zijian Wei, Haojie Ren, Changliang Zou
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both simulated and real data experiments demonstrate the advantages of both e-LORD and e-SAFFRON in FDR control and power. In this section, we evaluate the performance of our online testing framework on both synthetic and real data. We compare e-LORD, e-SAFFRON, p L-RAI, and p S-RAI with e-LOND, LORD++, SAFFRON, and Sup LORD in terms of FDR and power. |
| Researcher Affiliation | Academia | 1School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China 2School of Statistics and Data Sciences, LPMC, KLMDASR and LEBPS, Nankai University, Tianjin, China. Correspondence to: Haojie Ren <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 e-LORD 1: Input: target FDR level α, initial allocation coefficient ω1 (0, 1), parameters φ and ψ (0, 1), sequence of e-values e1, e2, . . .. 2: Calculate α1 = αω1 and decide δ1 = 1 n e1 1 α1 3: Update R1 = δ1 and ω2 by (9); 4: for t = 2, 3, . . . do 5: Update testing level αt by (8); 6: Make decision δt = 1 n et 1 αt 7: Update Rt = Rt 1 + δt and ωt+1 by (9); 8: end for 9: Output: decision set {δ1, δ2, . . .}. Algorithm 2 e-SAFFRON 1: Input: target FDR level α, initial allocation coefficient ω1 (0, 1), parameters λ, φ and ψ (0, 1), sequence of e-values e1, e2, . . .. 2: Calculate α1 = α(1 λ)ω1 and decide δ1 = 1 n e1 1 α1 3: Update R1 = δ1 and ω2 by (9); 4: for t = 2, 3, . . . do 5: Update testing level αt by (10); 6: Make decision δt = 1 n et 1 αt 7: Update Rt = Rt 1 + δt and ωt+1 by (9); 8: end for 9: Output: decision set {δ1, δ2, . . .}. |
| Open Source Code | Yes | The code for all numerical experiments in this paper is available at https://github.com/zijianwei01/e-GAI. |
| Open Datasets | Yes | We analyze the NYC taxi dataset from the Numenta Anomaly Benchmark (NAB) repository (Lavin & Ahmad, 2015). |
| Dataset Splits | Yes | The first 2000 time points are taken as the initial sequence for model calibration. The calibration is implemented by using the first 1/3 observations, assuming the related period to be free of bubbles. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | Yes | We use default parameters from the R package online FDR (Robertson et al., 2022) for other benchmarks. R package 2.5.1. |
| Experiment Setup | Yes | We take ω1 = 0.005, φ = ψ = 0.5 in e-LORD and p L-RAI, and additionally λ = 0.1 in e-SAFFRON and p S-RAI, while we use default parameters from the R package online FDR (Robertson et al., 2022) for other benchmarks. The target FDR level is set as α = 0.05. |