Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
Authors: Assaf Dauber, Meir Feder, Tomer Koren, Roi Livni
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | As a first step, we provide a simple construction that rules out the existence of a distribution-independent implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of distribution-dependent implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations. |
| Researcher Affiliation | Collaboration | Assaf Dauber Department of Electrical Engineering Tel-Aviv University assafdauber@mail.tau.ac.il Meir Feder Department of Electrical Engineering Tel-Aviv University meir@tauex.tau.ac.il Tomer Koren School of CS, Tel Aviv University & Google Research Tel Aviv tkoren@tauex.tau.ac.il Roi Livni Department of Electrical Engineering Tel Aviv University rlivni@tauex.tau.ac.il |
| Pseudocode | No | The paper describes algorithms (SGD, GD) using mathematical equations and textual descriptions, but does not include a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of its source code or a link to a code repository. |
| Open Datasets | No | The paper presents theoretical constructions and does not use or refer to publicly available datasets in the context of empirical training and evaluation. |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical constructions and proofs, therefore it does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper includes simulations but does not provide specific details about the hardware used (e.g., GPU/CPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers required to replicate any part of the work. |
| Experiment Setup | Yes | Figure 1: Simulation of GD (with step size η = 0.2) on 푓퐴,Σ for θ = 1 and varying values of 푏. We see that GD does not necessarily converge to the nearest solution, and tuning 푏changes the point towards which it is biased. |