reproducibilityindex.ai

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

Authors: Edward Moroshko, Blake E. Woodworth, Suriya Gunasekar, Jason D. Lee, Nati Srebro, Daniel Soudry

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Numerical Simulations and Discussion We numerically study optimization trajectories to see whether we can observe the asymptotic phenomena studied at ﬁnite initialization and accuracy. In all our simulations we employ the Normalized GD algorithm, where the gradient is normalized by the loss itself, to accelerate convergence [21].
Researcher Affiliation	Collaboration	Edward Moroshko edward.moroshko@gmail.com Technion Blake Woodworth blake@ttic.edu TTI Chicago Suriya Gunasekar suriya@ttic.edu Microsoft Research Jason D. Lee jasonlee@princeton.edu Princeton University Nathan Srebro nati@ttic.edu TTI Chicago Daniel Soudry daniel.soudry@gmail.com Technion
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about making its source code publicly available or links to a code repository.
Open Datasets	No	The paper states: “We plot trajectories for training depth D = 2 diagonal linear networks in dimension d = 3, on several constructed datasets, each consisting of three points.” and provides examples like “Data: (0.3, 1.5, 1), (1.5, 3, 1), (1, 2.5, 1)”. These are small, custom datasets presented directly in the text/figures, without any links, DOIs, or citations to public repositories for access.
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits, nor does it refer to predefined splits from external datasets. The datasets used are small and constructed within the paper itself.
Hardware Specification	No	The paper states it performs 'Numerical Simulations' and discusses the algorithm and learning rate, but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud resources) used for these simulations.
Software Dependencies	No	The paper does not provide specific software dependencies, such as libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'), that were used for the simulations.
Experiment Setup	Yes	The learning rate was small enough to ensure gradient ﬂow-like dynamics (always below 10^-3).