reproducibilityindex.ai

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Authors: Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally veriﬁed the sharpness-reduction phenomenon predicted by our theorem and its beneﬁts to generalization on CIFAR-10 with VGG-11 and Res Net-20, as well as matrix completion with BN (Appendix P).
Researcher Affiliation	Academia	Kaifeng Lyu Zhiyuan Li Sanjeev Arora Department of Computer Science Princeton University {klyu,zhiyuanli,arora}@cs.princeton.edu
Pseudocode	No	The paper describes algorithms and theoretical steps in prose, and includes mathematical equations, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See our supplementary material.
Open Datasets	Yes	We experimentally veriﬁed the sharpness-reduction phenomenon predicted by our theorem and its beneﬁts to generalization on CIFAR-10 with VGG-11 and Res Net-20, as well as matrix completion with BN (Appendix P).
Dataset Splits	No	The paper states 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix Q.' However, the main text provided does not explicitly detail the training/validation/test dataset splits with percentages, counts, or specific methodologies.
Hardware Specification	No	The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix Q.' However, the main text provided does not contain specific hardware details like GPU/CPU models or memory amounts.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1') in the provided main text.
Experiment Setup	No	The paper mentions 'constant learning rate ˆη and weight decay ˆλ' and 'full-batch GD', and refers to hyperparameters in Appendix Q, but it does not provide specific numerical values for hyperparameters or detailed system-level training settings in the main text.