Ringing ReLUs: Harmonic Distortion Analysis of Nonlinear Feedforward Networks

Authors: Christian H.X. Ali Mehmeti-Göpel, David Hartmann, Michael Wand

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verify the connection between blueshift and architectural choices, and provide evidence for a connection with trainability.Experiments confirm the theoretical predictions: We observe the predicted effects of depth, shortcuts and parallel computation on blueshift, and are able to differentiate different types of nonlinearities by the decay rate of coefficients of a polynomial approximation.
Researcher Affiliation Academia Christian H.X. Ali Mehmeti-Göpel Institute of Computer Science Johannes-Gutenberg University Mainz Staudingerweg 9, 55122 Mainz, Germany chalimeh@uni-mainz.de David Hartmann Institute of Computer Science Johannes Gutenberg-University of Mainz Staudingerweg 9, 55128 Mainz, Germany dahartma@uni-mainz.de Michael Wand Institute of Computer Science Johannes Gutenberg-University of Mainz Staudingerweg 9, 55128 Mainz, Germany mwand@uni-mainz.de
Pseudocode No The paper includes mathematical derivations and descriptions of methods but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The implementation for our experiments is based on Py Torch 1.5 and are provided as supplementary material.
Open Datasets Yes Dataset Cifar10 (Cifar100 for Figure 10)
Dataset Splits Yes We repeat the experiment on averaging-networks for the Cifar100 dataset, holding out 1% of the training data for validation.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies Yes The implementation for our experiments is based on Py Torch 1.5
Experiment Setup Yes The hyper-parameters below usually reach the standard test-accuracy of approximately 92-93% for a Res Net56 on Cifar10. Dataset Cifar10 (Cifar100 for Figure 10) Epochs 200 Scheduler Multistep (γ = 0.1) Milestones 100, 150 Learning rate 0.1 Batch size 128 Optimizer SGD + Momentum Momentum 0.9 Weight decay 0.0001 Augmentation Random Flip