Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

Authors: Edward Moroshko, Blake E. Woodworth, Suriya Gunasekar, Jason D. Lee, Nati Srebro, Daniel Soudry

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Simulations and Discussion We numerically study optimization trajectories to see whether we can observe the asymptotic phenomena studied at finite initialization and accuracy. In all our simulations we employ the Normalized GD algorithm, where the gradient is normalized by the loss itself, to accelerate convergence [21].
Researcher Affiliation Collaboration Edward Moroshko edward.moroshko@gmail.com Technion Blake Woodworth blake@ttic.edu TTI Chicago Suriya Gunasekar suriya@ttic.edu Microsoft Research Jason D. Lee jasonlee@princeton.edu Princeton University Nathan Srebro nati@ttic.edu TTI Chicago Daniel Soudry daniel.soudry@gmail.com Technion
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about making its source code publicly available or links to a code repository.
Open Datasets No The paper states: “We plot trajectories for training depth D = 2 diagonal linear networks in dimension d = 3, on several constructed datasets, each consisting of three points.” and provides examples like “Data: (0.3, 1.5, 1), (1.5, 3, 1), (1, 2.5, 1)”. These are small, custom datasets presented directly in the text/figures, without any links, DOIs, or citations to public repositories for access.
Dataset Splits No The paper does not provide specific training/test/validation dataset splits, nor does it refer to predefined splits from external datasets. The datasets used are small and constructed within the paper itself.
Hardware Specification No The paper states it performs 'Numerical Simulations' and discusses the algorithm and learning rate, but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud resources) used for these simulations.
Software Dependencies No The paper does not provide specific software dependencies, such as libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'), that were used for the simulations.
Experiment Setup Yes The learning rate was small enough to ensure gradient flow-like dynamics (always below 10^-3).