Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Implicit Bias of Benign Overfitting

Authors: Ohad Shamir

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we provide several new results on when one can or cannot expect benign overﬁtting to occur, for both regression and classiﬁcation tasks. We consider a prototypical and rather generic data model for benign overﬁtting of linear predictors... We prove that the max-margin predictor... is asymptotically biased towards minimizing a weighted squared hinge loss. This allows us to reduce the question of benign overﬁtting in classiﬁcation to the simpler question of whether this loss is a good surrogate for the misclassiﬁcation error, and use it to show benign overﬁtting in some new settings. The formal proofs of all our results appear in Appendix A.
Researcher Affiliation	Academia	Ohad Shamir EMAIL Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. Methods are described through mathematical formulations and proofs.
Open Source Code	No	The paper does not provide any concrete access to source code or explicitly state that code for the methodology described is available.
Open Datasets	No	The paper focuses on theoretical analysis using a 'prototypical and rather generic data model' and theoretical distributions (e.g., 'independent zero-mean Gaussian with covariance matrix 1/d_k I' in Example 1). It does not mention or provide access information for any publicly available datasets used for experimental evaluation.
Dataset Splits	No	The paper describes theoretical models and mathematical proofs, not empirical experiments. Therefore, it does not specify any training/test/validation dataset splits.
Hardware Specification	No	This paper is theoretical and does not report on any empirical experiments. Therefore, no hardware specifications are mentioned for running experiments.
Software Dependencies	No	This paper is theoretical and does not report on any empirical experiments. Therefore, no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	This paper focuses on theoretical analysis and does not describe any empirical experimental setup, hyperparameters, or training configurations.