reproducibilityindex.ai

Robust Regression via Hard Thresholding

Authors: Kush Bhatia, Prateek Jain, Purushottam Kar

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically we ﬁnd TORRENT, and more so its extensions, offering signiﬁcantly faster recovery than the state-of-the-art L1 solvers. Finally, we experimentally evaluate existing L1-based algorithms and our hard thresholding-based algorithms. The results demonstrate that our proposed algorithms (TORRENT-(FC/GD/HYB)) can be signiﬁcantly faster than the best L1 solvers, exhibit better recovery properties, as well as be more robust to dense white noise. Several numerical simulations were carried out on linear regression problems in low-dimensional, as well as sparse high-dimensional settings.
Researcher Affiliation	Collaboration	Microsoft Research, India Indian Institute of Technology Kanpur, India {t-kushb,prajain}@microsoft.com, purushot@cse.iitk.ac.in
Pseudocode	Yes	Algorithm 1 TORRENT: Thresholding Operatorbased Robust Regr Essio N me Thod Algorithm 2 UPDATE TORRENT-FC Algorithm 3 UPDATE TORRENT-GD Algorithm 4 UPDATE TORRENT-HYB
Open Source Code	No	The paper does not provide a specific link or statement indicating that the source code for their proposed methods (TORRENT-FC, -GD, -HYB, -HD) is openly available.
Open Datasets	No	Data: For the low dimensional setting, the regressor w Rp was chosen to be a random unit norm vector. Data was sampled as xi N(0, Ip) and response variables were generated as y i = w , xi . The set of corrupted points S was selected as a uniformly random (αn)-sized subset of [n] and the corruptions were set to bi U ( 5 y , 5 y ) for i S . The corrupted responses were then generated as yi = y i + bi + εi where εi N(0, σ2). For the sparse high-dimensional setting, supp(w ) was selected to be a random s -sized subset of [p]. The paper describes a synthetic data generation process but does not mention the use of any publicly available or open datasets with concrete access information.
Dataset Splits	No	The paper describes how synthetic data was generated for experiments but does not explicitly state training, validation, or test dataset splits in terms of percentages, counts, or predefined splits.
Hardware Specification	Yes	Both the L1 solver, as well as our methods, were implemented in Matlab and were run on a single core 2.4GHz machine with 8 GB RAM.
Software Dependencies	No	The paper states that the methods were 'implemented in Matlab' but does not specify a version number for Matlab or any other software dependencies with their versions.
Experiment Setup	Yes	Input: Training data {xi, yi} , i = 1 . . . n, step length η, thresholding parameter β, tolerance ϵ. Phase-transition diagrams (Figure 1) were generated by repeating each experiment 100 times. For all other plots, each experiment was run over 20 random instances of the data and the plots were drawn to depict the mean results. For these experiments, n was set to 5s log(p) and the fraction of corrupted points α was varied from 0.1 to 0.7. We deemed an algorithm successful on an instance if it obtained a model bw with error r bw < 10 4 w 2.