Simplicity Bias via Global Convergence of Sharpness Minimization

Authors: Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine our theory on the convergence to a rank one feature matrix in Figure 1 via a synthetic experiment by considering a network with m = 10 neurons on ambient dimension d = 3 and n = 3 data points. We further pick learning rate η = 0.05 and noise variance σ = 0.03 for implementing label noise SGD. Each entry of the data points is generated uniformly on [0, 1], which is the same data generating process in all the experiments. As Figure 1 shows, the second and third eigenvalues converge to zero which is predicted by Theorem 3.3.
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology 2Toyota Technological Institute at Chicago 3Google Research.
Pseudocode No The paper describes the algorithms and flows (e.g., label noise SGD, Riemannian gradient flow) in narrative text and mathematical equations, but does not present them in a structured pseudocode or algorithm block format.
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets No Each entry of the data points is generated uniformly on [0, 1], which is the same data generating process in all the experiments.
Dataset Splits No The paper conducts synthetic experiments by generating data points, but it does not specify any explicit training, validation, or test dataset splits or proportions.
Hardware Specification No The paper describes synthetic experiments, but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud resources) used to run these experiments.
Software Dependencies No The paper does not specify any software dependencies, programming languages, or library versions (e.g., Python, PyTorch, TensorFlow) used for the experiments or theoretical derivations.
Experiment Setup Yes We further pick learning rate η = 0.05 and noise variance σ = 0.03 for implementing label noise SGD. ... in a similar setting with the same learning rate η = 0.05 but larger σ = 0.2 ... with learning rate η = 0, 1 and noise variance σ = 0.2.