Phenomenology of Double Descent in Finite-Width Neural Networks

Authors: Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To empirically demonstrate the validity of our theoretical results, we carry out the entire procedure of obtaining double descent for neural networks. Figure 1 shows the results of running the double descent experiments for the settings of two and three layer fully-connected networks with Re LU activation trained for 5K epochs via SGD.
Researcher Affiliation Academia Sidak Pal Singh a,c, Aurelien Lucchib, Thomas Hofmanna and Bernhard Sch olkopf c a ETH Z urich, Switzerland b Department of Mathematics and Computer Science, University of Basel c MPI for Intelligent Systems, T ubingen, Germany
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The most recent version of our paper can be found on ar Xiv and the code for the experiments is available on Git Hub.
Open Datasets Yes In terms of the dataset, we primarily utilize MNIST1D (Greydanus, 2020), which is a downscaled version of MNIST yet designed to be significantly harder than the usual version. However, we also present results on CIFAR10 and the usual (easier) MNIST...
Dataset Splits No The paper mentions using a "test set" for evaluating population loss, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts to allow reproduction of the data partitioning.
Hardware Specification No The paper mentions the computational cost and the precision used ("FLOAT64"), but it does not specify any particular hardware components like GPU or CPU models used for running the experiments.
Software Dependencies No The paper does not explicitly list specific software dependencies with their version numbers, such as programming language versions or library versions.
Experiment Setup Yes We train all the networks via SGD with learning rate 0.5 and learning rate decay by a factor of 0.75 after each quarter of the target number of epochs.