Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Authors: Zixuan Wang, Zhouzi Li, Jian Li
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically identify the norm of output layer weight as an interesting indicator of the sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of the sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in the EOS regime in two-layer fully-connected linear neural networks. |
| Researcher Affiliation | Academia | Zhouzi Li IIIS, Tsinghua University EMAIL Zixuan Wang IIIS, Tsinghua University EMAIL Jian Li IIIS, Tsinghua University EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | As illustrated in Figure 1, we train a shallow neural network by gradient descent on a subset of 1,000 samples from CIFAR-10 (Krizhevsky et al. [17]), using the MSE loss as the objective. |
| Dataset Splits | No | The paper mentions using a 'subset of 1,000 samples from CIFAR-10' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch' in its bibliography but does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper describes training a 'shallow neural network' using 'gradient descent' and 'MSE loss' on a 'subset of 1,000 samples from CIFAR-10'. However, it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or detailed training configurations. |