reproducibilityindex.ai

List-decodable Linear Regression

Authors: Sushrut Karmalkar, Adam Klivans, Pravesh Kothari

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We give the ﬁrst polynomial-time algorithm for robust regression in the listdecodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any α < 1, our algorithm takes as input a sample {(xi, yi)}i n of n linear equations where αn of the equations satisfy yi = xi, ℓ + ζ for some small noise ζ and (1 α)n of the equations are arbitrarily chosen. It outputs a list L of size O(1/α) a ﬁxed constant that contains an ℓthat is close to ℓ . Our algorithm succeeds whenever the inliers are chosen from a certiﬁably anticoncentrated distribution D. As a corollary of our algorithmic result, we obtain a (d/α)O(1/α8) time algorithm to ﬁnd a O(1/α) size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that ℓ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. To solve the problem we introduce a new framework for list-decodable learning that strengthens the identiﬁability to algorithms paradigm based on the sum-of-squares method.
Researcher Affiliation	Academia	Sushrut Karmalkar University of Texas at Austin sushrutk@cs.utexas.edu Adam R. Klivans University of Texas at Austin klivans@cs.utexas.edu Pravesh K. Kothari Princeton University and Institute for Advanced Study kothari@cs.princeton.edu
Pseudocode	No	The paper describes an 'inefﬁcient algorithm' and then 'An efﬁcient algorithm' using numbered steps, but these are high-level prose descriptions and not formatted as structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about making its source code available, nor does it provide any links to a code repository.
Open Datasets	No	The paper uses a theoretical model for sample generation (Model 1.1 Robust Linear Regression) rather than a specific named or publicly available dataset. It does not provide any access information for a dataset.
Dataset Splits	No	The paper is theoretical and focuses on algorithm design and proofs, not empirical evaluation. Therefore, it does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup. Therefore, it does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical, presenting algorithms and proofs. It does not include details about an experimental setup, such as hyperparameters or system-level training settings.