List-decodable Linear Regression
Authors: Sushrut Karmalkar, Adam Klivans, Pravesh Kothari
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We give the first polynomial-time algorithm for robust regression in the listdecodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any α < 1, our algorithm takes as input a sample {(xi, yi)}i n of n linear equations where αn of the equations satisfy yi = xi, ℓ + ζ for some small noise ζ and (1 α)n of the equations are arbitrarily chosen. It outputs a list L of size O(1/α) a fixed constant that contains an ℓthat is close to ℓ . Our algorithm succeeds whenever the inliers are chosen from a certifiably anticoncentrated distribution D. As a corollary of our algorithmic result, we obtain a (d/α)O(1/α8) time algorithm to find a O(1/α) size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that ℓ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. To solve the problem we introduce a new framework for list-decodable learning that strengthens the identifiability to algorithms paradigm based on the sum-of-squares method. |
| Researcher Affiliation | Academia | Sushrut Karmalkar University of Texas at Austin sushrutk@cs.utexas.edu Adam R. Klivans University of Texas at Austin klivans@cs.utexas.edu Pravesh K. Kothari Princeton University and Institute for Advanced Study kothari@cs.princeton.edu |
| Pseudocode | No | The paper describes an 'inefficient algorithm' and then 'An efficient algorithm' using numbered steps, but these are high-level prose descriptions and not formatted as structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about making its source code available, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper uses a theoretical model for sample generation (Model 1.1 Robust Linear Regression) rather than a specific named or publicly available dataset. It does not provide any access information for a dataset. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and proofs, not empirical evaluation. Therefore, it does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup. Therefore, it does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, presenting algorithms and proofs. It does not include details about an experimental setup, such as hyperparameters or system-level training settings. |