Robust Regression via Hard Thresholding
Authors: Kush Bhatia, Prateek Jain, Purushottam Kar
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically we find TORRENT, and more so its extensions, offering significantly faster recovery than the state-of-the-art L1 solvers. Finally, we experimentally evaluate existing L1-based algorithms and our hard thresholding-based algorithms. The results demonstrate that our proposed algorithms (TORRENT-(FC/GD/HYB)) can be significantly faster than the best L1 solvers, exhibit better recovery properties, as well as be more robust to dense white noise. Several numerical simulations were carried out on linear regression problems in low-dimensional, as well as sparse high-dimensional settings. |
| Researcher Affiliation | Collaboration | Microsoft Research, India Indian Institute of Technology Kanpur, India {t-kushb,prajain}@microsoft.com, purushot@cse.iitk.ac.in |
| Pseudocode | Yes | Algorithm 1 TORRENT: Thresholding Operatorbased Robust Regr Essio N me Thod Algorithm 2 UPDATE TORRENT-FC Algorithm 3 UPDATE TORRENT-GD Algorithm 4 UPDATE TORRENT-HYB |
| Open Source Code | No | The paper does not provide a specific link or statement indicating that the source code for their proposed methods (TORRENT-FC, -GD, -HYB, -HD) is openly available. |
| Open Datasets | No | Data: For the low dimensional setting, the regressor w Rp was chosen to be a random unit norm vector. Data was sampled as xi N(0, Ip) and response variables were generated as y i = w , xi . The set of corrupted points S was selected as a uniformly random (αn)-sized subset of [n] and the corruptions were set to bi U ( 5 y , 5 y ) for i S . The corrupted responses were then generated as yi = y i + bi + εi where εi N(0, σ2). For the sparse high-dimensional setting, supp(w ) was selected to be a random s -sized subset of [p]. The paper describes a synthetic data generation process but does not mention the use of any publicly available or open datasets with concrete access information. |
| Dataset Splits | No | The paper describes how synthetic data was generated for experiments but does not explicitly state training, validation, or test dataset splits in terms of percentages, counts, or predefined splits. |
| Hardware Specification | Yes | Both the L1 solver, as well as our methods, were implemented in Matlab and were run on a single core 2.4GHz machine with 8 GB RAM. |
| Software Dependencies | No | The paper states that the methods were 'implemented in Matlab' but does not specify a version number for Matlab or any other software dependencies with their versions. |
| Experiment Setup | Yes | Input: Training data {xi, yi} , i = 1 . . . n, step length η, thresholding parameter β, tolerance ϵ. Phase-transition diagrams (Figure 1) were generated by repeating each experiment 100 times. For all other plots, each experiment was run over 20 random instances of the data and the plots were drawn to depict the mean results. For these experiments, n was set to 5s log(p) and the fraction of corrupted points α was varied from 0.1 to 0.7. We deemed an algorithm successful on an instance if it obtained a model bw with error r bw < 10 4 w 2. |