The Implicit Bias of Minima Stability: A View from Function Space
Authors: Rotem Mulayoff, Tomer Michaeli, Daniel Soudry
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now verify our theoretical predictions in experiments. We train a single-hidden-layer Re LU network using GD with varying step sizes, all initialized at the same point. Figure 3(a) shows the training data and solutions to which GD converged. |
| Researcher Affiliation | Academia | Rotem Mulayoff Technion Israel Institute of Technology rotem.mulayof@gmail.com Tomer Michaeli Technion Israel Institute of Technology tomer.m@ee.technion.ac.il Daniel Soudry Technion Israel Institute of Technology daniel.soudry@gmail.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements or links regarding the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper refers to "empirical distribution of the data" and uses synthetic data derived from "Uniform distribution", "Gaussian distribution", and "Laplace distribution" for illustrative purposes, but does not provide access information for a specific, named public dataset. |
| Dataset Splits | No | The paper does not specify exact split percentages or sample counts for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software names with version numbers used in the experiments. |
| Experiment Setup | Yes | We train a single-hidden-layer Re LU network using GD with varying step sizes, all initialized at the same point... Figure 3(c) visualizes the sharpness of the solution as a function of the learning rate. |