reproducibilityindex.ai

Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study

Authors: Assaf Dauber, Meir Feder, Tomer Koren, Roi Livni

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	As a ﬁrst step, we provide a simple construction that rules out the existence of a distribution-independent implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of distribution-dependent implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations.
Researcher Affiliation	Collaboration	Assaf Dauber Department of Electrical Engineering Tel-Aviv University assafdauber@mail.tau.ac.il Meir Feder Department of Electrical Engineering Tel-Aviv University meir@tauex.tau.ac.il Tomer Koren School of CS, Tel Aviv University & Google Research Tel Aviv tkoren@tauex.tau.ac.il Roi Livni Department of Electrical Engineering Tel Aviv University rlivni@tauex.tau.ac.il
Pseudocode	No	The paper describes algorithms (SGD, GD) using mathematical equations and textual descriptions, but does not include a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a link to a code repository.
Open Datasets	No	The paper presents theoretical constructions and does not use or refer to publicly available datasets in the context of empirical training and evaluation.
Dataset Splits	No	The paper is theoretical and focuses on mathematical constructions and proofs, therefore it does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper includes simulations but does not provide specific details about the hardware used (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers required to replicate any part of the work.
Experiment Setup	Yes	Figure 1: Simulation of GD (with step size η = 0.2) on 푓퐴,Σ for θ = 1 and varying values of 푏. We see that GD does not necessarily converge to the nearest solution, and tuning 푏changes the point towards which it is biased.