reproducibilityindex.ai

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Authors: Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake E Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 7.1. The population error of the gradient ﬂow solution for a diagonal linear network as a function of initialization scale α and shape s, in the sparse regression problem described in Section 9. 9. Numerical Simulations Details In order to study the effect of initialization over the implicit bias of gradient ﬂow, we follow the sparse regression problem suggested by Woodworth et al. (2020), where x(1), ..., x(N) N(0, I) and y(n) N( β , x(n) , 0.01) and β is r -sparse, with non-zero entries equal to 1/ r . For every N d, gradient ﬂow will generally reach a zero training error solution, however not all of these solutions will be the same, allowing us to explore the effect of initialization over the implicit bias.
Researcher Affiliation	Academia	1The Blavatnik School of Computer Science, Tel Aviv University 2Technion Israel Institute of Technology 3Toyota Technological Institute at Chicago.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information for open-source code (e.g., specific repository link or explicit code release statement) for the methodology described.
Open Datasets	No	The paper describes a synthetic dataset generation process: 'sparse regression problem suggested by Woodworth et al. (2020), where x(1), ..., x(N) N(0, I) and y(n) N( β , x(n) , 0.01) and β is r -sparse, with non-zero entries equal to 1/ r .' This describes how data is created, but does not provide a link, DOI, or repository for a pre-existing, publicly available dataset.
Dataset Splits	No	The paper mentions dataset parameters (N=100, d=1000, r=5) but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or methodology for splitting).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud computing instance types) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup	No	The paper specifies parameters for the synthetic dataset generation ('N = 100, d = 1000, r = 5') in Section 9, but it does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, number of epochs, optimizer settings) for the gradient flow training itself.