reproducibilityindex.ai

Outlier-Robust Sparse Estimation via Non-Convex Optimization

Authors: Yu Cheng, Ilias Diakonikolas, Rong Ge, Shivam Gupta, Daniel Kane, Mahdi Soltanolkotabi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an experimental evaluation of our robust sparse mean estimation algorithm on synthetic datasets with a focus on statistical accuracy (ℓ2-distance between the output and the true sparse mean). We evaluate our algorithm (Sparse Gradient Descent, Sparse GD) on different noise models, and compare it to the following previous algorithms: ... Our experimental results are summarized in Figures 1, 2 and 3.
Researcher Affiliation	Academia	Yu Cheng Brown University Providence, RI 02912 yu_cheng@brown.edu; Ilias Diakonikolas University of Wisconsin-Madison Madison, WI 53706 ilias@cs.wisc.edu; Rong Ge Duke University Durham, NC 27708 rongge@cs.duke.edu; Shivam Gupta University of Texas at Austin Austin, TX 78712 shivamgupta@utexas.edu; Daniel M. Kane University of California, San Diego La Jolla, CA 92093 dakane@cs.ucsd.edu; Mahdi Soltanolkotabi University of Southern California Los Angeles, CA 90089 soltanol@usc.edu
Pseudocode	Yes	Algorithm 1: Robust sparse mean estimation. Input: k > 0, 0 < ϵ < ϵ0, and an ϵ-corrupted set of samples (Xi)n i=1 drawn from a distribution with k-sparse mean µ. Output: a vector bµ that is close to µ. 1: Find a ﬁrst-order stationary point w n,ϵ of the objective minw f(w) = Σw I F,k,k. 2: Return bµ = (µw)Q where Q is a set of k entries of µw with largest magnitude.
Open Source Code	Yes	An implementation of our algorithms is available at https://github.com/guptashvm/Sparse-GD.
Open Datasets	No	The paper uses 'synthetic datasets' but does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for them.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	Yes	We ran our experiments on a computer with a 1.6 GHz Intel Core i5 processor and 8 GB RAM.
Software Dependencies	No	The paper mentions building on an existing codebase but does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	We evaluate the algorithms on various noise models: Linear-hiding noise. The inliers are drawn from N(0, I). Let S be a size k set. Then, half the outliers are drawn from N(1S, I) and the other half are drawn from N(0, 2I IS). Tail-ﬂipping noise. This noise model picks a k-sparse direction v and replaces the ϵ fraction of points farthest in the v direction with points in the +v direction. Constant-bias noise. This model adds a constant to every coordinate of the outlier points. In Figure 3, we add 2 to every coordinate of every outlier point.