Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
List-decodable Linear Regression
Authors: Sushrut Karmalkar, Adam Klivans, Pravesh Kothari
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We give the first polynomial-time algorithm for robust regression in the listdecodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any α < 1, our algorithm takes as input a sample {(xi, yi)}i n of n linear equations where αn of the equations satisfy yi = xi, ℓ + ζ for some small noise ζ and (1 α)n of the equations are arbitrarily chosen. It outputs a list L of size O(1/α) a fixed constant that contains an ℓthat is close to ℓ . Our algorithm succeeds whenever the inliers are chosen from a certifiably anticoncentrated distribution D. As a corollary of our algorithmic result, we obtain a (d/α)O(1/α8) time algorithm to find a O(1/α) size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that ℓ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. To solve the problem we introduce a new framework for list-decodable learning that strengthens the identifiability to algorithms paradigm based on the sum-of-squares method. |
| Researcher Affiliation | Academia | Sushrut Karmalkar University of Texas at Austin EMAIL Adam R. Klivans University of Texas at Austin EMAIL Pravesh K. Kothari Princeton University and Institute for Advanced Study EMAIL |
| Pseudocode | No | The paper describes an 'inefficient algorithm' and then 'An efficient algorithm' using numbered steps, but these are high-level prose descriptions and not formatted as structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about making its source code available, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper uses a theoretical model for sample generation (Model 1.1 Robust Linear Regression) rather than a specific named or publicly available dataset. It does not provide any access information for a dataset. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and proofs, not empirical evaluation. Therefore, it does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup. Therefore, it does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, presenting algorithms and proofs. It does not include details about an experimental setup, such as hyperparameters or system-level training settings. |