Phenomenology of Double Descent in Finite-Width Neural Networks
Authors: Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To empirically demonstrate the validity of our theoretical results, we carry out the entire procedure of obtaining double descent for neural networks. Figure 1 shows the results of running the double descent experiments for the settings of two and three layer fully-connected networks with Re LU activation trained for 5K epochs via SGD. |
| Researcher Affiliation | Academia | Sidak Pal Singh a,c, Aurelien Lucchib, Thomas Hofmanna and Bernhard Sch olkopf c a ETH Z urich, Switzerland b Department of Mathematics and Computer Science, University of Basel c MPI for Intelligent Systems, T ubingen, Germany |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The most recent version of our paper can be found on ar Xiv and the code for the experiments is available on Git Hub. |
| Open Datasets | Yes | In terms of the dataset, we primarily utilize MNIST1D (Greydanus, 2020), which is a downscaled version of MNIST yet designed to be significantly harder than the usual version. However, we also present results on CIFAR10 and the usual (easier) MNIST... |
| Dataset Splits | No | The paper mentions using a "test set" for evaluating population loss, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts to allow reproduction of the data partitioning. |
| Hardware Specification | No | The paper mentions the computational cost and the precision used ("FLOAT64"), but it does not specify any particular hardware components like GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with their version numbers, such as programming language versions or library versions. |
| Experiment Setup | Yes | We train all the networks via SGD with learning rate 0.5 and learning rate decay by a factor of 0.75 after each quarter of the target number of epochs. |