Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions

Authors: Ilias Diakonikolas, Daniel Kane, Jasper Lee, Ankit Pensia

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main result is the first computationally efficient robust mean estimator in the 2,k norm, with performance matching the conjectured computational-statistical tradeoff, under the standard heavy-tailed assumption that the covariance 55 I and the additional mild assumption that the 4th moment is bounded in all axis directions. and also the Acknowledgements section says 3. If you ran experiments...(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A] (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [N/A] (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]
Researcher Affiliation Academia Ilias Diakonikolas University of Wisconsin-Madison ilias@cs.wisc.edu Daniel M. Kane University of California, San Diego dakane@cs.ucsd.edu Jasper C.H. Lee University of Wisconsin-Madison jasper.lee@wisc.edu Ankit Pensia University of Wisconsin-Madison ankitp@cs.wisc.edu
Pseudocode Yes Algorithm 1 Robust Sparse Mean Estimation with High Probability 1. Input: An -corrupted sample set S 5 Rd of size n 2. Median-of-Means pre-processing: Group points into g groups, each of size m = n/g, where g = 55n55, and take the sample mean of a group to be a new point 3. Compute initial coordinate-wise median-of-means estimate 55 4. Truncate all points to within B1(55, 255k), namely, given a point x, we replace it with the point ha,55(x), where ha,b is defined in Equation (1). 5. Run the stability-based robust sparse mean estimator from Fact 1.6 on the samples after the processing of Step 4.
Open Source Code No The paper does not provide concrete access to source code. The authors state '[N/A]' for questions related to code availability in the self-assessment section.
Open Datasets No The paper is theoretical and does not report experimental results, thus no specific dataset information for training is provided.
Dataset Splits No The paper is theoretical and does not report experimental results, thus no specific dataset split information for validation is provided.
Hardware Specification No The paper is theoretical and does not report experimental results. The self-assessment section explicitly states '[N/A]' for questions regarding compute resources.
Software Dependencies No The paper is theoretical and does not report experimental results, thus no specific ancillary software details with version numbers are provided.
Experiment Setup No The paper is theoretical and does not report experimental results, thus no specific experimental setup details like hyperparameters or training configurations are provided.