reproducibilityindex.ai

Heavy-tailed regression with a generalized median-of-means

Authors: Daniel Hsu, Sivan Sabato

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove loss approximation guarantees that hold for general distributions, including those with heavy tails. All prior results only hold for estimators which either assume bounded or subgaussian distributions, require prior knowledge of distributional properties, or are not known to be computationally tractable. The core technique used in the proposed estimator is a new generalization of the median-of-means estimator to arbitrary metric spaces. We believe that our new generalization of this basic technique will be applicable to many other problems with heavy-tailed distributions. Indeed, the full version of this paper (Hsu & Sabato, 2013) reports additional applications to sparse linear regression and low-rank matrix approximation. We begin by stating and discussing the main results for linear regression in Section 2. We then explain the core technique in Section 3. The application of the technique for smooth and convex losses is analyzed in Section 4. Section 5 provides the derivations of our main results for regression.
Researcher Affiliation	Collaboration	Daniel Hsu DJHSU@CS.COLUMBIA.EDU Department of Computer Science, Columbia University Sivan Sabato SIVAN.SABATO@MICROSOFT.COM Microsoft Research New England, 1 Memorial Drive, Cambridge, MA 02446
Pseudocode	Yes	Algorithm 1 Regression for heavy-tails; Algorithm 2 Median-of-means estimator; Algorithm 3 Robust approximation with random distances
Open Source Code	No	The paper does not provide any explicit statements about open-source code availability or links to code repositories.
Open Datasets	No	The paper is theoretical and does not conduct experiments on specific datasets, thus no information about public datasets is provided.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments, therefore no training/validation/test splits are provided.
Hardware Specification	No	The paper is theoretical and does not describe empirical experiments, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe empirical experiments; therefore, no software dependencies with version numbers are listed.
Experiment Setup	No	The paper is theoretical and focuses on mathematical proofs and algorithm design rather than empirical evaluation, so no experimental setup details are provided.