SGD on Neural Networks Learns Functions of Increasing Complexity
Authors: Dimitris Kalimeris, Gal Kaplun, Preetum Nakkiran, Benjamin Edelman, Tristan Yang, Boaz Barak, Haofeng Zhang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. |
| Researcher Affiliation | Academia | Preetum Nakkiran Harvard University Gal Kaplun Harvard University Dimitris Kalimeris Harvard University Tristan Yang Harvard University Benjamin L. Edelman Harvard University Fred Zhang Harvard University Boaz Barak Harvard University |
| Pseudocode | No | The paper describes the methods and experimental setup in textual form and through figures, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We consider the following binary classification tasks: (i) Binary MNIST: predict whether the image represents a number from 0 to 4 or from 5 to 9. (ii) CIFAR-10 Animals vs Objects: predict whether the image represents an animal or an object. (iii) CIFAR-10 First 5 vs Last 5: predict whether the image is in classes {0 . . . 4} or {5 . . . 9}. |
| Dataset Splits | No | The paper uses well-known datasets like MNIST and CIFAR-10 and refers to 'train error' and 'test error', but it does not explicitly state the specific percentages or sample counts for training, validation, and test splits used in their experiments, nor does it mention a dedicated validation set. |
| Hardware Specification | No | The paper describes the neural network architectures used (CNNs, MLPs) but does not specify any hardware details such as GPU models, CPU types, or cloud resources used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'vanilla SGD' and 'binary cross-entropy loss' but does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper states it uses 'standard uniform Xavier initialization', 'binary cross-entropy loss', 'vanilla SGD without regularization', and 'a relatively small step-size for SGD'. However, it does not provide specific numerical values for hyperparameters such as the exact learning rate, batch size, or number of epochs in the provided text. |