Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Outlier Robust Mean Estimation with Subgaussian Rates via Stability

Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates.
Researcher Affiliation Academia Ilias Diakonikolas University of Wisconsin-Madison EMAIL Daniel M. Kane University of California, San Diego EMAIL Ankit Pensia University of Wisconsin-Madison EMAIL
Pseudocode No The paper describes steps for algorithms in prose, such as the pre-processing step using the median-of-means principle. However, it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured code-like formatting.
Open Source Code No The paper does not contain any statement about making its source code publicly available or provide a link to a code repository.
Open Datasets No The paper is theoretical and discusses properties of distributions and samples, but does not refer to specific, named datasets (e.g., MNIST, CIFAR-10) or provide information about public access to any training data used for experiments.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with data splits, thus no validation split information is provided.
Hardware Specification No The paper is theoretical and does not describe any computational experiments that would require specifying hardware used.
Software Dependencies No The paper is theoretical and does not describe any computational experiments that would require specifying software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe empirical experiments with specific hyperparameters or training configurations.