Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Authors: Zixuan Wang, Zhouzi Li, Jian Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically identify the norm of output layer weight as an interesting indicator of the sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of the sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in the EOS regime in two-layer fully-connected linear neural networks. |
| Researcher Affiliation | Academia | Zhouzi Li IIIS, Tsinghua University zhouzi188763@gmail.com Zixuan Wang IIIS, Tsinghua University wangzx2019012326@gmail.com Jian Li IIIS, Tsinghua University lapordge@gmail.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | As illustrated in Figure 1, we train a shallow neural network by gradient descent on a subset of 1,000 samples from CIFAR-10 (Krizhevsky et al. [17]), using the MSE loss as the objective. |
| Dataset Splits | No | The paper mentions using a 'subset of 1,000 samples from CIFAR-10' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch' in its bibliography but does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper describes training a 'shallow neural network' using 'gradient descent' and 'MSE loss' on a 'subset of 1,000 samples from CIFAR-10'. However, it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or detailed training configurations. |