A unifying view on implicit bias in training linear neural networks
Authors: Chulhee Yun, Shankar Krishnan, Hossein Mobahi
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide experiments that corroborate our analysis. |
| Researcher Affiliation | Collaboration | Chulhee Yun MIT chulheey@mit.edu Shankar Krishnan Google Research skrishnan@google.com Hossein Mobahi Google Research hmobahi@google.com |
| Pseudocode | No | No section or figure explicitly labeled 'Pseudocode' or 'Algorithm' with structured steps was found. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The experiments use small, custom-defined toy datasets (e.g., 'a single 2-dimensional data point (x, y) = ([1 2], 1)', 'two data points (x1, y1) = ([1 2], +1) and (x2, y2) = ([0 3], 1)'). No specific access information (link, DOI, repository, or formal citation to an established benchmark) for these datasets is provided, as they appear to be illustrative examples rather than large-scale public datasets. |
| Dataset Splits | No | The paper does not specify train/validation/test dataset splits. Experiments are conducted on very small, illustrative datasets, and the focus is on the trajectory of parameters rather than performance on standard splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. It only describes the algorithms and training parameters. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiments. It describes the theoretical framework and algorithms but omits implementation details. |
| Experiment Setup | Yes | We run GD with small step size η = 10 3 for large enough number of iterations T = 5 103. (Regression); With initial scales α {0.01, 0.5, 1}, we run GD with step size η = 5 10 4 for T = 2 106 iterations. (Classification); Networks are initialized at the same coefficients (circles on purple lines) (Figure 1 caption). |