Efficient List-Decodable Regression using Batches

Authors: Abhimanyu Das, Ayush Jain, Weihao Kong, Rajat Sen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We demonstrate the use of batches in studying list-decodable linear regression, in which only α (0, 1] fraction of batches contain genuine samples from a common distribution and the rest can contain arbitrary or even adversarial samples. When genuine batches have Ω(1/α) samples each, our algorithm can efficiently find a small list of potential regression parameters, with a high probability that one of them is close to the true parameter. This is the first polynomial time algorithm for list-decodable linear regression, and its sample complexity scales nearly linearly with the dimension of the covariates.
Researcher Affiliation Collaboration 1Google Research 2UC San Diego. The majority of this work was completed while Ayush Jain was an intern at Google Research. Correspondence to: Ayush Jain <ayjain@ucsd.edu>.
Pseudocode Yes Algorithm 1 MAINALGORITHM 1: Input: Data {{(xb i, yb i )}i [n]}b B, α, C, σ. 2: For each b B, βb init 1 and βinit {βb init}b B. 3: List L {βinit} and M . 4: while L = do 5: Pick any element β in L and remove it from L. [...] Algorithm 2 FINDCLIPPINGPPARAMETER 1: Input: Set B, β, σ, a1 1, a2 data {{(xb i, yb i }i [n]}b B. 2: κ 3: while True do 4: wκ any approximate stationary point of clipped losses {f b( , κ)} w.r.t. weight vector β such that Eβ[f b(wκ, κ)] log(2/α)σ
Open Source Code No The paper does not provide any statements about releasing code, nor does it include links to a code repository.
Open Datasets No The paper is theoretical and does not use or reference any specific public or open datasets for training.
Dataset Splits No The paper is theoretical and does not describe empirical experiments involving dataset splits (e.g., training, validation, test splits).
Hardware Specification No The paper is theoretical and does not describe empirical experiments; therefore, no hardware specifications are mentioned.
Software Dependencies No The paper does not mention any specific software dependencies or version numbers needed to replicate the work.
Experiment Setup No The paper is theoretical and focuses on algorithm design and proofs. It does not describe any experimental setup details such as hyperparameters or system-level training settings.