Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study

Authors: Assaf Dauber, Meir Feder, Tomer Koren, Roi Livni

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical As a first step, we provide a simple construction that rules out the existence of a distribution-independent implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of distribution-dependent implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations.
Researcher Affiliation Collaboration Assaf Dauber Department of Electrical Engineering Tel-Aviv University assafdauber@mail.tau.ac.il Meir Feder Department of Electrical Engineering Tel-Aviv University meir@tauex.tau.ac.il Tomer Koren School of CS, Tel Aviv University & Google Research Tel Aviv tkoren@tauex.tau.ac.il Roi Livni Department of Electrical Engineering Tel Aviv University rlivni@tauex.tau.ac.il
Pseudocode No The paper describes algorithms (SGD, GD) using mathematical equations and textual descriptions, but does not include a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a link to a code repository.
Open Datasets No The paper presents theoretical constructions and does not use or refer to publicly available datasets in the context of empirical training and evaluation.
Dataset Splits No The paper is theoretical and focuses on mathematical constructions and proofs, therefore it does not specify training, validation, or test dataset splits.
Hardware Specification No The paper includes simulations but does not provide specific details about the hardware used (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies No The paper does not provide specific software dependencies or version numbers required to replicate any part of the work.
Experiment Setup Yes Figure 1: Simulation of GD (with step size η = 0.2) on 푓퐴,Σ for θ = 1 and varying values of 푏. We see that GD does not necessarily converge to the nearest solution, and tuning 푏changes the point towards which it is biased.