Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Optimality of the Median-of-Means Estimator under Adversarial Contamination
Authors: Xabier de Juan, Santiago Mazuelas
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this appendix, we illustrate the theoretical results in previous sections with numerical simulations. In particular, the experiments show that MoM performs particularly well for symmetric distributions but does not fully leverage light-tails in accordance with the theoretical results. |
| Researcher Affiliation | Academia | 1Basque Center of Applied Mathematics (BCAM) 2IKERBASQUE-Basque Foundation for Science EMAIL |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format. It focuses on mathematical derivations and proofs. |
| Open Source Code | No | 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The numerical experiments are simple and can be easily coded. Moreover, numerical experiments are not part of the contribution of this paper. |
| Open Datasets | Yes | In Figure 1(a) we illustrate Theorem 3.2 by considering a Pareto distribution with finite variance. [...] In Figure 1(c) we depict Theorem 5.1 using a Student's t distribution. [...] In Figure 2, we show that the order α2/3 cannot be improved for half-normal distributions for any choice for the number of blocks k. |
| Dataset Splits | No | In all the results in this section, for a fixed estimator bµ and distribution p, we repeated the following procedure nrep times: 1. Draw n i.i.d. samples from p. 2. Contaminate the i.i.d. sample, where α is the fraction of contaminated samples. 3. Compute the estimation error |bµ - µp|. Finally, we plot the 1 - δ quantile of the nrep computed errors |bµ - µp| as a function of α. |
| Hardware Specification | No | The experimental results in the paper can be carried out in a regular desktop machine in few hours. |
| Software Dependencies | No | The paper does not specify any particular software dependencies or library versions used for the numerical experiments. |
| Experiment Setup | Yes | In all the results in this section, for a fixed estimator bµ and distribution p, we repeated the following procedure nrep times: ... In table 2, we summarize the parameter of the various experimental results presented in the present section. In addition, we relate each figure to the corresponding theorem it illustrates. ... For the MoM estimator, in figs. 1(a) and 1(b) we have used a number of blocks k = 3αn (corresponding to γ = 3 in Theorems 3.2 and 3.3), whereas in fig. 1(c) we set k = n/5 (corresponding to β = 1/5 in Theorem 5.1). ... In all the experimental results, we consider the following adversarial attack: given an uncontaminated sample X 1, X 2, . . . , X n the adversary removes the αn largest values and replaces them with αn new samples, all set to mini X i . |