Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Robust Regression via SGD on the l1 loss
Authors: Scott Pesme, Nicolas Flammarion
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, we provide experimental evidence of the efficiency of this simple and highly scalable algorithm. In this section we illustrate our theoretical results. We consider the experimental framework of [79] using synthetic datasets. |
| Researcher Affiliation | Academia | Scott Pesme EPFL Lausanne, Switzerland EMAIL Nicolas Flammarion EPFL Lausanne, Switzerland EMAIL |
| Pseudocode | No | The SGD recursion is given by an equation: "θn = θn 1 + γnsgn (yn xn, θn 1 ) xn,", but this is not presented as a pseudocode block or a clearly labeled algorithm. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper states, "We consider the experimental framework of [79] using synthetic datasets." and describes how they are generated, but it does not provide access information (link, DOI, citation) to a publicly available dataset. |
| Dataset Splits | No | The paper describes the generation of synthetic data and the experimental setup (e.g., contamination models), but it does not explicitly provide details about train/validation/test splits (e.g., percentages, sample counts, or specific split files). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for the experiments. |
| Experiment Setup | Yes | We consider the experimental framework of [79] using synthetic datasets. The inputs xi are i.i.d. from N(0, H) where H is either the identity matrix (conditioning κ = 1) or a p.s.d matrix with eigenvalues (1/k)1 k d and random eigenvectors (κ = 1/d). The outputs yi are generated following yi = xi, θ + εi + bi where (εi)1 i n are i.i.d. from N(0, σ2) and the bi s are defined according to the following contamination model: for η > 0.5, a set of n/4 corruptions are set to 1000, another n/4 are set to 1000 and the rest (to reach proportion η > 0.5) are sampled from U([1, 10]). All results are averaged over five replications. We plot the convergence rate of averaged SGD on different loss functions: the ℓ1 loss, the ℓ2 loss and the Huber loss for which we consider various parameters. In the SGD setting this corresponds to a total of 5n iterations. |