reproducibilityindex.ai

Label Noise SGD Provably Prefers Flat Global Minimizers

Authors: Alex Damian, Tengyu Ma, Jason D. Lee

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 presents experimental results which support our theory. Finally, Section 6 discusses the implications of this work. and 4 Experiments In order to test the ability of SGD with label noise to escape poor global minimizers and converge to better minimizers, we initialize Algorithm 1 at global minimizers of the training loss which achieve 100% training accuracy yet generalize poorly to the test set.
Researcher Affiliation	Academia	Alex Damian Princeton University ad27@princeton.edu Tengyu Ma Stanford University tengyuma@stanford.edu Jason Lee Princeton University jasonlee@princeton.edu
Pseudocode	Yes	Algorithm 1: SGD with Label Noise Input: θ0, step size η, noise variance σ2, batch size B, steps T
Open Source Code	No	Code will be submitted through the supplementary material and will be made available (through Github) upon acceptance.
Open Datasets	Yes	Experiments were run with Res Net18 on CIFAR10 [17] without data augmentation or weight decay. For CIFAR10 we cite Krizhevsky [17], as requested by the creators on https://www.cs.toronto.edu/ kriz/cifar.html.
Dataset Splits	No	The paper mentions 'training accuracy' and 'test accuracy' in Section 4, but it does not specify the use of a separate validation split, its size, or how it was created.
Hardware Specification	Yes	The experiments were performed on NVIDIA GeForce RTX 2080 Ti GPUs.
Software Dependencies	No	The code was implemented in PyTorch [24] and PyTorch Lightning [6], and weights and biases [2] was used for experiment tracking.
Experiment Setup	Yes	Experiments were run with Res Net18 on CIFAR10 [17] without data augmentation or weight decay. The experiments were conducted with randomized label ﬂipping with probability 0.2 (see Appendix E for the extension of Theorem 1 to classiﬁcation with label ﬂipping), cross entropy loss, and batch size 256.