Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Methods for High-Dimensional Linear Learning

Authors: Ibrahim Merad, Stéphane Gaïffas

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our results through numerical experiments using synthetic data for regression and real data sets for classiﬁcation (Section 6). Our experiments conﬁrm our theoretical ﬁndings together with a comparison to other recent approaches proposed in the literature.
Researcher Affiliation	Academia	Ibrahim Merad EMAIL LPSM, UMR 8001, Universit e Paris Diderot, Paris, France, Stéphane Gaïffas EMAIL LPSM, UMR 8001, Universit e Paris Diderot, Paris, France and DMA, Ecole normale sup erieure
Pseudocode	Yes	Algorithm : Approximate Multistage Mirror Descent (AMMD) Algorithm : Approximate Multistage Dual Averaging (AMDA)
Open Source Code	Yes	All algorithms introduced in this paper as well as the main baselines from literature we use for comparisons are implemented and easily accessible in a few lines of code through our Python library called linlearn, open-sourced under the BSD-3 License on Git Hub and available here.3 https://github.com/linlearn/linlearn
Open Datasets	Yes	We also carry out experiments on real high dimensional binary classiﬁcation data sets. These are referred to as gina and bioresponse and were both downloaded from openml.com.
Dataset Splits	No	The paper mentions "data splitting" as a proof artifact for theoretical guarantees, explicitly stating: "Note that we do not implement it later in our experimental section." For empirical evaluation, it mentions using a "validation subset of the data" for a heuristic, but does not provide specific details (percentages, counts, or predefined splits) for the overall training, validation, and testing of the models.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud computing instance types) used for running the experiments. It describes the experimental setup in terms of data generation and algorithms, but not the computational environment.
Software Dependencies	No	The paper mentions a "Python library called linlearn" (their own) and "Scikit Learn" (a third-party library) but does not provide specific version numbers for either of them. The requirement is for specific version numbers for reproducibility.
Experiment Setup	Yes	We generate n covariates Xi Rd following a non-isotropic distribution with covariance matrix Σ and labels Yi = X i θ +ξi for a ﬁxed s-sparse θ Rd and simulated noise entries ξi. The covariance matrix Σ is diagonal with entries drawn uniformly at random in [1, 10]. We use the least-squares loss ℓ(z, y) = 1/2(z y)2 in this experiment and the problem parameters are n = 500, d = 5000, s = 40 and a sparsity upper bound s = 50 is given to the algorithms instead of the real value. The noise variables ξi always follow a Pareto distribution with parameter α = 2.05.