What training reveals about neural network complexity

Authors: Andreas Loukas, Marinos Poiitis, Stefanie Jegelka

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our findings in the context of two tasks: Task 1. Regression of a sinusoidal function with increasing frequency... Task 2. CIFAR classification under label corruption... In agreement with previous studies [4, 6, 3, 2], Figure 2 shows that training slows down as the complexity of the fitted function increases. Figures 2b and 2e depict the per-epoch bias trajectory...
Researcher Affiliation Academia Andreas Loukas EPFL andreas.loukas@epfl.ch Marinos Poiitis Aristotle University of Thessaloniki mpoiitis@csd.auth.gr Stefanie Jegelka MIT stefje@mit.edu
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code No The paper does not provide an explicit statement or link for open-source code availability for the described methodology.
Open Datasets Yes Task 2. CIFAR classification under label corruption. In our second experiment, we trained a convolutional neural network (CNN) to classify 10000 images from the dog and airplane classes of CIFAR10 [74].
Dataset Splits No The paper mentions training on '100 randomly generated training points' for Task 1 and '10000 images from the dog and airplane classes of CIFAR10' for Task 2, but does not provide specific train/validation/test dataset split information (percentages, counts, or references to predefined splits).
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers needed to replicate the experiment.
Experiment Setup Yes We trained an MLP with 5 layers consisting entirely of Re LU activations and with the 1st layer weights being identity. We repeated the experiment 10 times, each time training the network with SGD using a learning rate of 0.001 and an MSE loss until it had fit the sinusoidal function at 100 randomly generated training points. We set the first layer identically with the regression experiment. We repeated the experiment 8 times, each time training the network with SGD using a BCE loss and a learning rate of 0.0025. f (t) be a depth d NN with Re LU activations being trained with SGD, a BCE loss and 1/2-Dropout.