Simplicity Bias via Global Convergence of Sharpness Minimization
Authors: Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine our theory on the convergence to a rank one feature matrix in Figure 1 via a synthetic experiment by considering a network with m = 10 neurons on ambient dimension d = 3 and n = 3 data points. We further pick learning rate η = 0.05 and noise variance σ = 0.03 for implementing label noise SGD. Each entry of the data points is generated uniformly on [0, 1], which is the same data generating process in all the experiments. As Figure 1 shows, the second and third eigenvalues converge to zero which is predicted by Theorem 3.3. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2Toyota Technological Institute at Chicago 3Google Research. |
| Pseudocode | No | The paper describes the algorithms and flows (e.g., label noise SGD, Riemannian gradient flow) in narrative text and mathematical equations, but does not present them in a structured pseudocode or algorithm block format. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | Each entry of the data points is generated uniformly on [0, 1], which is the same data generating process in all the experiments. |
| Dataset Splits | No | The paper conducts synthetic experiments by generating data points, but it does not specify any explicit training, validation, or test dataset splits or proportions. |
| Hardware Specification | No | The paper describes synthetic experiments, but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud resources) used to run these experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies, programming languages, or library versions (e.g., Python, PyTorch, TensorFlow) used for the experiments or theoretical derivations. |
| Experiment Setup | Yes | We further pick learning rate η = 0.05 and noise variance σ = 0.03 for implementing label noise SGD. ... in a similar setting with the same learning rate η = 0.05 but larger σ = 0.2 ... with learning rate η = 0, 1 and noise variance σ = 0.2. |