Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
Authors: Edward Moroshko, Blake E. Woodworth, Suriya Gunasekar, Jason D. Lee, Nati Srebro, Daniel Soudry
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Simulations and Discussion We numerically study optimization trajectories to see whether we can observe the asymptotic phenomena studied at finite initialization and accuracy. In all our simulations we employ the Normalized GD algorithm, where the gradient is normalized by the loss itself, to accelerate convergence [21]. |
| Researcher Affiliation | Collaboration | Edward Moroshko EMAIL Technion Blake Woodworth EMAIL TTI Chicago Suriya Gunasekar EMAIL Microsoft Research Jason D. Lee EMAIL Princeton University Nathan Srebro EMAIL TTI Chicago Daniel Soudry EMAIL Technion |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about making its source code publicly available or links to a code repository. |
| Open Datasets | No | The paper states: “We plot trajectories for training depth D = 2 diagonal linear networks in dimension d = 3, on several constructed datasets, each consisting of three points.” and provides examples like “Data: (0.3, 1.5, 1), (1.5, 3, 1), (1, 2.5, 1)”. These are small, custom datasets presented directly in the text/figures, without any links, DOIs, or citations to public repositories for access. |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits, nor does it refer to predefined splits from external datasets. The datasets used are small and constructed within the paper itself. |
| Hardware Specification | No | The paper states it performs 'Numerical Simulations' and discusses the algorithm and learning rate, but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud resources) used for these simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'), that were used for the simulations. |
| Experiment Setup | Yes | The learning rate was small enough to ensure gradient flow-like dynamics (always below 10^-3). |