Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Authors: Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally verified the sharpness-reduction phenomenon predicted by our theorem and its benefits to generalization on CIFAR-10 with VGG-11 and Res Net-20, as well as matrix completion with BN (Appendix P). |
| Researcher Affiliation | Academia | Kaifeng Lyu Zhiyuan Li Sanjeev Arora Department of Computer Science Princeton University {klyu,zhiyuanli,arora}@cs.princeton.edu |
| Pseudocode | No | The paper describes algorithms and theoretical steps in prose, and includes mathematical equations, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See our supplementary material. |
| Open Datasets | Yes | We experimentally verified the sharpness-reduction phenomenon predicted by our theorem and its benefits to generalization on CIFAR-10 with VGG-11 and Res Net-20, as well as matrix completion with BN (Appendix P). |
| Dataset Splits | No | The paper states 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix Q.' However, the main text provided does not explicitly detail the training/validation/test dataset splits with percentages, counts, or specific methodologies. |
| Hardware Specification | No | The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix Q.' However, the main text provided does not contain specific hardware details like GPU/CPU models or memory amounts. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1') in the provided main text. |
| Experiment Setup | No | The paper mentions 'constant learning rate ˆη and weight decay ˆλ' and 'full-batch GD', and refers to hyperparameters in Appendix Q, but it does not provide specific numerical values for hyperparameters or detailed system-level training settings in the main text. |