Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Estimation Under Heterogeneous Corruption Rates

Authors: Syomantak Chaudhuri, Jerry Li, Thomas Courtade

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	While we emphasize that the main contribution of this work is theoretical, in the supplementary material, we also perform some preliminary synthetic evaluations to validate the effectiveness of our methods. In the bounded and univariate Gaussian settings, we demonstrate that both the thresholding-based methods as well as the per-sample reweighting methods outperform baselines from the standard homogeneous robust statistics literature. Our results also demonstrate that in some settings, the per-sample reweighting methods also yield improvements over the threshold-based methods in practice.
Researcher Affiliation	Academia	Syomantak Chaudhuri University of California, Berkeley Jerry Li University of Washington Thomas A. Courtade University of California, Berkeley
Pseudocode	Yes	Algorithm 1 Robust Mean Estimation for Bounded Distributions
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufﬁcient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justiﬁcation: Attached Jupyter notebooks allows reviewers to verify the plots without writing any extra code.
Open Datasets	No	We set n = 10^4 and for a ﬁxed value q, we sample the corruption rates λ i.i.d. from the distribution with cdf given by F(t) = 1 (1 t)q. ... For bounded distribution, we choose r = 1 and choose the true underlying distribution to be the point mass at 0, and the corrupted values to be 1. For univariate Gaussian distribution, we ﬁx the true distribution to be N(0, 1) and the corrupted values sampled i.i.d. from N(100, 1).
Dataset Splits	No	We set n = 10^4 and for a ﬁxed value q, we sample the corruption rates λ i.i.d. from the distribution with cdf given by F(t) = 1 (1 t)q. As q increases we can expect a higher corruption rate. Fixing this sampled λ, we sample the dataset 10^4 times. For bounded distribution, we plot the mean squared-error and the corresponding standard deviations over the trials at each value of q considered. For the Gaussian distribution, we plot the empirical 4/5-th quantile of the squared-error along with 15/20-th and 17/20-th quantiles over the trials.
Hardware Specification	No	Justiﬁcation: Experiments performed at not computationally expensive and can be performed on most laptops on CPU (without GPU).
Software Dependencies	No	The paper mentions 'Jupyter notebooks' in the context of open-source code availability, but it does not specify any software versions for programming languages, libraries, or other tools used.
Experiment Setup	Yes	We set n = 10^4 and for a ﬁxed value q, we sample the corruption rates λ i.i.d. from the distribution with cdf given by F(t) = 1 (1 t)q. ... For bounded distribution, we choose r = 1 and choose the true underlying distribution to be the point mass at 0, and the corrupted values to be 1. For univariate Gaussian distribution, we ﬁx the true distribution to be N(0, 1) and the corrupted values sampled i.i.d. from N(100, 1).