reproducibilityindex.ai

Simplicity Bias via Global Convergence of Sharpness Minimization

Authors: Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine our theory on the convergence to a rank one feature matrix in Figure 1 via a synthetic experiment by considering a network with m = 10 neurons on ambient dimension d = 3 and n = 3 data points. We further pick learning rate η = 0.05 and noise variance σ = 0.03 for implementing label noise SGD. Each entry of the data points is generated uniformly on [0, 1], which is the same data generating process in all the experiments. As Figure 1 shows, the second and third eigenvalues converge to zero which is predicted by Theorem 3.3.
Researcher Affiliation	Collaboration	1Massachusetts Institute of Technology 2Toyota Technological Institute at Chicago 3Google Research.
Pseudocode	No	The paper describes the algorithms and flows (e.g., label noise SGD, Riemannian gradient flow) in narrative text and mathematical equations, but does not present them in a structured pseudocode or algorithm block format.
Open Source Code	No	The paper does not contain any statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	No	Each entry of the data points is generated uniformly on [0, 1], which is the same data generating process in all the experiments.
Dataset Splits	No	The paper conducts synthetic experiments by generating data points, but it does not specify any explicit training, validation, or test dataset splits or proportions.
Hardware Specification	No	The paper describes synthetic experiments, but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud resources) used to run these experiments.
Software Dependencies	No	The paper does not specify any software dependencies, programming languages, or library versions (e.g., Python, PyTorch, TensorFlow) used for the experiments or theoretical derivations.
Experiment Setup	Yes	We further pick learning rate η = 0.05 and noise variance σ = 0.03 for implementing label noise SGD. ... in a similar setting with the same learning rate η = 0.05 but larger σ = 0.2 ... with learning rate η = 0, 1 and noise variance σ = 0.2.