On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Authors: Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake E Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 7.1. The population error of the gradient flow solution for a diagonal linear network as a function of initialization scale α and shape s, in the sparse regression problem described in Section 9. 9. Numerical Simulations Details In order to study the effect of initialization over the implicit bias of gradient flow, we follow the sparse regression problem suggested by Woodworth et al. (2020), where x(1), ..., x(N) N(0, I) and y(n) N( β , x(n) , 0.01) and β is r -sparse, with non-zero entries equal to 1/ r . For every N d, gradient flow will generally reach a zero training error solution, however not all of these solutions will be the same, allowing us to explore the effect of initialization over the implicit bias.
Researcher Affiliation Academia 1The Blavatnik School of Computer Science, Tel Aviv University 2Technion Israel Institute of Technology 3Toyota Technological Institute at Chicago.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information for open-source code (e.g., specific repository link or explicit code release statement) for the methodology described.
Open Datasets No The paper describes a synthetic dataset generation process: 'sparse regression problem suggested by Woodworth et al. (2020), where x(1), ..., x(N) N(0, I) and y(n) N( β , x(n) , 0.01) and β is r -sparse, with non-zero entries equal to 1/ r .' This describes how data is created, but does not provide a link, DOI, or repository for a pre-existing, publicly available dataset.
Dataset Splits No The paper mentions dataset parameters (N=100, d=1000, r=5) but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or methodology for splitting).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud computing instance types) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup No The paper specifies parameters for the synthetic dataset generation ('N = 100, d = 1000, r = 5') in Section 9, but it does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, number of epochs, optimizer settings) for the gradient flow training itself.