Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Failures of Model-dependent Generalization Bounds for Least-norm Interpolation

Authors: Peter L. Bartlett, Philip M. Long

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We consider bounds on the generalization performance of the least-norm linear regressor, in the over-parameterized regime where it can interpolate the data. We describe a sense in which any generalization bound of a type that is commonly proved in statistical learning theory must sometimes be very loose when applied to analyze the least-norm interpolant. In particular, for a variety of natural joint distributions on training examples, any valid generalization bound that depends only on the output of the learning algorithm, the number of training examples, and the conﬁdence parameter, and that satisﬁes a mild condition (substantially weaker than monotonicity in sample size), must sometimes be very loose it can be bounded below by a constant when the true excess risk goes to zero. Keywords: generalization bounds, benign overﬁtting, linear regression, statistical learning theory, lower bounds
Researcher Affiliation	Collaboration	Peter L. Bartlett EMAIL University of California, Berkeley & Google, 367 Evans Hall #3860 Berkeley, CA 94720-3860. Philip M. Long EMAIL Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043.
Pseudocode	No	The paper defines concepts and provides lemmas and proofs, but does not include any structured pseudocode or algorithm blocks. For example, Definition 6 describes Pn in numbered steps, but this is a definition, not an algorithm.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories or mention code in supplementary materials.
Open Datasets	No	The paper discusses theoretical probability distributions and data generation processes (e.g., "joint distribution Dn on (x, y)-pairs is deﬁned as follows. Let s = n , N = s2, d = N2. Let θ be an arbitrary unit-length vector. Let Σs be an arbitrary covariance matrix with eigenvalues λ1 = 1/81, λ2 = = λd = 1/d2. The marginal of Dn on x is then N(0, Σs).") rather than using specific, publicly available empirical datasets. No links, DOIs, or citations to external datasets are provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments on empirical datasets, therefore, there is no mention of dataset splits (e.g., training, validation, test splits).
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup that would require hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper focuses on theoretical analysis and proofs, and as such, it does not mention any specific software dependencies or versions required to replicate experiments.
Experiment Setup	No	The paper presents a theoretical analysis and proofs, and does not involve empirical experiments. Therefore, there are no details provided regarding experimental setup, hyperparameters, or training configurations.