Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Outlier Robust Mean Estimation with Subgaussian Rates via Stability
Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. |
| Researcher Affiliation | Academia | Ilias Diakonikolas University of Wisconsin-Madison EMAIL Daniel M. Kane University of California, San Diego EMAIL Ankit Pensia University of Wisconsin-Madison EMAIL |
| Pseudocode | No | The paper describes steps for algorithms in prose, such as the pre-processing step using the median-of-means principle. However, it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured code-like formatting. |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available or provide a link to a code repository. |
| Open Datasets | No | The paper is theoretical and discusses properties of distributions and samples, but does not refer to specific, named datasets (e.g., MNIST, CIFAR-10) or provide information about public access to any training data used for experiments. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with data splits, thus no validation split information is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments that would require specifying hardware used. |
| Software Dependencies | No | The paper is theoretical and does not describe any computational experiments that would require specifying software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments with specific hyperparameters or training configurations. |