SGD on Neural Networks Learns Functions of Increasing Complexity

Authors: Dimitris Kalimeris, Gal Kaplun, Preetum Nakkiran, Benjamin Edelman, Tristan Yang, Boaz Barak, Haofeng Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks.
Researcher Affiliation Academia Preetum Nakkiran Harvard University Gal Kaplun Harvard University Dimitris Kalimeris Harvard University Tristan Yang Harvard University Benjamin L. Edelman Harvard University Fred Zhang Harvard University Boaz Barak Harvard University
Pseudocode No The paper describes the methods and experimental setup in textual form and through figures, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We consider the following binary classification tasks: (i) Binary MNIST: predict whether the image represents a number from 0 to 4 or from 5 to 9. (ii) CIFAR-10 Animals vs Objects: predict whether the image represents an animal or an object. (iii) CIFAR-10 First 5 vs Last 5: predict whether the image is in classes {0 . . . 4} or {5 . . . 9}.
Dataset Splits No The paper uses well-known datasets like MNIST and CIFAR-10 and refers to 'train error' and 'test error', but it does not explicitly state the specific percentages or sample counts for training, validation, and test splits used in their experiments, nor does it mention a dedicated validation set.
Hardware Specification No The paper describes the neural network architectures used (CNNs, MLPs) but does not specify any hardware details such as GPU models, CPU types, or cloud resources used for running the experiments.
Software Dependencies No The paper mentions using 'vanilla SGD' and 'binary cross-entropy loss' but does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper states it uses 'standard uniform Xavier initialization', 'binary cross-entropy loss', 'vanilla SGD without regularization', and 'a relatively small step-size for SGD'. However, it does not provide specific numerical values for hyperparameters such as the exact learning rate, batch size, or number of epochs in the provided text.