Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Consistent Robust Regression
Authors: Kush Bhatia, Prateek Jain, Parameswaran Kamalaruban, Purushottam Kar
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments were carried out on synthetically generated linear regression datasets with corruptions. All implementations were done in Matlab and were run on a single core 2.4GHz machine with 8GB RAM. The experiments establish the following: 1) CRR gives consistent estimates of the regression model, especially in situations with a large number of corruptions where the ordinary least squares estimator fails catastrophically, 2) CRR scales better to large datasets than the TORRENT-FC algorithm of [3] (upto 5 faster) and the Extended Lasso algorithm of [17] (upto 20 faster). |
| Researcher Affiliation | Collaboration | Kush Bhatia University of California, Berkeley EMAIL Prateek Jain Microsoft Research, India EMAIL Parameswaran Kamalaruban EPFL, Switzerland EMAIL Purushottam Kar Indian Institute of Technology, Kanpur EMAIL |
| Pseudocode | Yes | Algorithm 1 CRR: Consistent Robust Regression |
| Open Source Code | No | The paper does not provide any links to open-source code repositories or explicitly state that the source code for their methodology is publicly available. |
| Open Datasets | No | The paper uses synthetically generated data and does not provide access information (link, citation, or repository) for a publicly available or open dataset. |
| Dataset Splits | No | The paper uses synthetically generated data but does not specify any training, validation, or test dataset splits (e.g., percentages, sample counts, or defined methodologies for partitioning the data). |
| Hardware Specification | Yes | All implementations were done in Matlab and were run on a single core 2.4GHz machine with 8GB RAM. |
| Software Dependencies | No | The paper states 'All implementations were done in Matlab' but does not specify a version number for Matlab or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Data: The model w Rd was chosen to be a random unit norm vector. The data was generated as xi N(0, Id). The k responses to be corrupted were chosen uniformly at random and the value of the corruptions was sets as b i Unif (10, 20). Responses were then generated as yi = xi, w + ηi + b i where ηi N(0, σ2). All reported results were averaged over 20 randomly trials. Evaluation Metric: We measure the performance of various algorithms using the standard L2 error: r bw = bw w 2. For the timing experiments, we deemed an algorithm to converge on an instance if it obtained a model wt such that wt wt 1 2 10 4. |