A unifying view on implicit bias in training linear neural networks

Authors: Chulhee Yun, Shankar Krishnan, Hossein Mobahi

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide experiments that corroborate our analysis.
Researcher Affiliation Collaboration Chulhee Yun MIT chulheey@mit.edu Shankar Krishnan Google Research skrishnan@google.com Hossein Mobahi Google Research hmobahi@google.com
Pseudocode No No section or figure explicitly labeled 'Pseudocode' or 'Algorithm' with structured steps was found.
Open Source Code No The paper does not contain any statement about releasing source code or a link to a code repository.
Open Datasets No The experiments use small, custom-defined toy datasets (e.g., 'a single 2-dimensional data point (x, y) = ([1 2], 1)', 'two data points (x1, y1) = ([1 2], +1) and (x2, y2) = ([0 3], 1)'). No specific access information (link, DOI, repository, or formal citation to an established benchmark) for these datasets is provided, as they appear to be illustrative examples rather than large-scale public datasets.
Dataset Splits No The paper does not specify train/validation/test dataset splits. Experiments are conducted on very small, illustrative datasets, and the focus is on the trajectory of parameters rather than performance on standard splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. It only describes the algorithms and training parameters.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiments. It describes the theoretical framework and algorithms but omits implementation details.
Experiment Setup Yes We run GD with small step size η = 10 3 for large enough number of iterations T = 5 103. (Regression); With initial scales α {0.01, 0.5, 1}, we run GD with step size η = 5 10 4 for T = 2 106 iterations. (Classification); Networks are initialized at the same coefficients (circles on purple lines) (Figure 1 caption).