What training reveals about neural network complexity
Authors: Andreas Loukas, Marinos Poiitis, Stefanie Jegelka
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our findings in the context of two tasks: Task 1. Regression of a sinusoidal function with increasing frequency... Task 2. CIFAR classification under label corruption... In agreement with previous studies [4, 6, 3, 2], Figure 2 shows that training slows down as the complexity of the fitted function increases. Figures 2b and 2e depict the per-epoch bias trajectory... |
| Researcher Affiliation | Academia | Andreas Loukas EPFL andreas.loukas@epfl.ch Marinos Poiitis Aristotle University of Thessaloniki mpoiitis@csd.auth.gr Stefanie Jegelka MIT stefje@mit.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code availability for the described methodology. |
| Open Datasets | Yes | Task 2. CIFAR classification under label corruption. In our second experiment, we trained a convolutional neural network (CNN) to classify 10000 images from the dog and airplane classes of CIFAR10 [74]. |
| Dataset Splits | No | The paper mentions training on '100 randomly generated training points' for Task 1 and '10000 images from the dog and airplane classes of CIFAR10' for Task 2, but does not provide specific train/validation/test dataset split information (percentages, counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers needed to replicate the experiment. |
| Experiment Setup | Yes | We trained an MLP with 5 layers consisting entirely of Re LU activations and with the 1st layer weights being identity. We repeated the experiment 10 times, each time training the network with SGD using a learning rate of 0.001 and an MSE loss until it had fit the sinusoidal function at 100 randomly generated training points. We set the first layer identically with the regression experiment. We repeated the experiment 8 times, each time training the network with SGD using a BCE loss and a learning rate of 0.0025. f (t) be a depth d NN with Re LU activations being trained with SGD, a BCE loss and 1/2-Dropout. |