Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
Authors: Edward Moroshko, Blake E. Woodworth, Suriya Gunasekar, Jason D. Lee, Nati Srebro, Daniel Soudry
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Simulations and Discussion We numerically study optimization trajectories to see whether we can observe the asymptotic phenomena studied at finite initialization and accuracy. In all our simulations we employ the Normalized GD algorithm, where the gradient is normalized by the loss itself, to accelerate convergence [21]. |
| Researcher Affiliation | Collaboration | Edward Moroshko edward.moroshko@gmail.com Technion Blake Woodworth blake@ttic.edu TTI Chicago Suriya Gunasekar suriya@ttic.edu Microsoft Research Jason D. Lee jasonlee@princeton.edu Princeton University Nathan Srebro nati@ttic.edu TTI Chicago Daniel Soudry daniel.soudry@gmail.com Technion |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about making its source code publicly available or links to a code repository. |
| Open Datasets | No | The paper states: “We plot trajectories for training depth D = 2 diagonal linear networks in dimension d = 3, on several constructed datasets, each consisting of three points.” and provides examples like “Data: (0.3, 1.5, 1), (1.5, 3, 1), (1, 2.5, 1)”. These are small, custom datasets presented directly in the text/figures, without any links, DOIs, or citations to public repositories for access. |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits, nor does it refer to predefined splits from external datasets. The datasets used are small and constructed within the paper itself. |
| Hardware Specification | No | The paper states it performs 'Numerical Simulations' and discusses the algorithm and learning rate, but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud resources) used for these simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'), that were used for the simulations. |
| Experiment Setup | Yes | The learning rate was small enough to ensure gradient flow-like dynamics (always below 10^-3). |