Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Simple and Optimal Sublinear Algorithms for Mean Estimation

Authors: Beatrice Bertolotti, Matteo Russo, Chris Schwiegelshohn, Sudarshan Shyam

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We finally provide an extensive experimental evaluation among several estimators which concludes that the geometric-median-of-means-based approach is typically the most competitive in practice.
Researcher Affiliation	Academia	Beatrice Bertolotti University of Pavia, Italy EMAIL Matteo Russo Sapienza University of Rome, Italy EMAIL Chris Schwiegelshohn Aarhus University, Denmark EMAIL Sudarshan Shyam Aarhus University, Denmark EMAIL
Pseudocode	Yes	Algorithm 1 MEANESTIMATE(ε, δ) for i = 1, . . . , b log δ 1 do Sample Si points independently and uniformly at random with \|Si\| = aε 1 Compute the sample mean ˆµi = 1 \|Si\| P p Si p Output AGGREGATE(ˆµ1, . . . , ˆµb log δ 1)
Open Source Code	Yes	Code base and results can be found at https://github.com/matteorusso/sublinear_mean_estimation.
Open Datasets	Yes	We test our algorithms against the benchmarks mentioned above on the following datasets: MNIST and Fashion-MNIST, both of which are composed of 60,000 points, each with 784 features. We also consider Cover Type, composed of 581,012 points, each with 54 features.
Dataset Splits	No	For each sample size m {10, 15, 20, 25, 30, 100, 200, 500, 1000, 2000, 5000, 10000}, we repeat the execution of every algorithm 50 times and report averages and variances across runs. No specific train/test/validation splits are mentioned for MNIST, Fashion-MNIST, or Cover Type.
Hardware Specification	Yes	The experiments were carried out on the Google Colab default CPU.
Software Dependencies	No	In particular, our use of Num Py s vectorized operations can lead to non-obvious runtime behavior, as such operations benefit from low-level optimizations (e.g., memory locality, multi-threaded backends) and often incur fixed overheads. No specific versions provided for NumPy or any other software.
Experiment Setup	Yes	For each sample size m {10, 15, 20, 25, 30, 100, 200, 500, 1000, 2000, 5000, 10000}, we repeat the execution of every algorithm 50 times and report averages and variances across runs. We implement both MINSUMSELECT and FASTGD. As described in Section 4, we take as initial guess the coordinate-wise median of the computed sample means.