The Implicit Bias of Minima Stability: A View from Function Space

Authors: Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now verify our theoretical predictions in experiments. We train a single-hidden-layer Re LU network using GD with varying step sizes, all initialized at the same point. Figure 3(a) shows the training data and solutions to which GD converged.
Researcher Affiliation Academia Rotem Mulayoff Technion Israel Institute of Technology rotem.mulayof@gmail.com Tomer Michaeli Technion Israel Institute of Technology tomer.m@ee.technion.ac.il Daniel Soudry Technion Israel Institute of Technology daniel.soudry@gmail.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statements or links regarding the availability of open-source code for the methodology described.
Open Datasets No The paper refers to "empirical distribution of the data" and uses synthetic data derived from "Uniform distribution", "Gaussian distribution", and "Laplace distribution" for illustrative purposes, but does not provide access information for a specific, named public dataset.
Dataset Splits No The paper does not specify exact split percentages or sample counts for training, validation, or test sets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper does not specify any software names with version numbers used in the experiments.
Experiment Setup Yes We train a single-hidden-layer Re LU network using GD with varying step sizes, all initialized at the same point... Figure 3(c) visualizes the sharpness of the solution as a function of the learning rate.