Ringing ReLUs: Harmonic Distortion Analysis of Nonlinear Feedforward Networks
Authors: Christian H.X. Ali Mehmeti-Göpel, David Hartmann, Michael Wand
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify the connection between blueshift and architectural choices, and provide evidence for a connection with trainability.Experiments confirm the theoretical predictions: We observe the predicted effects of depth, shortcuts and parallel computation on blueshift, and are able to differentiate different types of nonlinearities by the decay rate of coefficients of a polynomial approximation. |
| Researcher Affiliation | Academia | Christian H.X. Ali Mehmeti-Göpel Institute of Computer Science Johannes-Gutenberg University Mainz Staudingerweg 9, 55122 Mainz, Germany chalimeh@uni-mainz.de David Hartmann Institute of Computer Science Johannes Gutenberg-University of Mainz Staudingerweg 9, 55128 Mainz, Germany dahartma@uni-mainz.de Michael Wand Institute of Computer Science Johannes Gutenberg-University of Mainz Staudingerweg 9, 55128 Mainz, Germany mwand@uni-mainz.de |
| Pseudocode | No | The paper includes mathematical derivations and descriptions of methods but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation for our experiments is based on Py Torch 1.5 and are provided as supplementary material. |
| Open Datasets | Yes | Dataset Cifar10 (Cifar100 for Figure 10) |
| Dataset Splits | Yes | We repeat the experiment on averaging-networks for the Cifar100 dataset, holding out 1% of the training data for validation. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | Yes | The implementation for our experiments is based on Py Torch 1.5 |
| Experiment Setup | Yes | The hyper-parameters below usually reach the standard test-accuracy of approximately 92-93% for a Res Net56 on Cifar10. Dataset Cifar10 (Cifar100 for Figure 10) Epochs 200 Scheduler Multistep (γ = 0.1) Milestones 100, 150 Learning rate 0.1 Batch size 128 Optimizer SGD + Momentum Momentum 0.9 Weight decay 0.0001 Augmentation Random Flip |