Truth or backpropaganda? An empirical investigation of deep learning theory
Authors: Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TRUTH OR BACKPROPAGANDA? AN EMPIRICAL INVESTIGATION OF DEEP LEARNING THEORY We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. |
| Researcher Affiliation | Academia | Micah Goldblum Department of Mathematics University of Maryland goldblum@umd.edu Jonas Geiping Department of Computer Science and Electrical Engineering University of Siegen jonas.geiping@uni-siegen.de Avi Schwarzschild Department of Mathematics University of Maryland avi1@umd.edu Michael Moeller Department of Computer Science and Electrical Engineering University of Siegen michael.moeller@uni-siegen.de Tom Goldstein Department of Computer Science University of Maryland tomg@umd.edu |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly labeled or structured like code. |
| Open Source Code | No | The paper does not contain any statements about making source code publicly available. |
| Open Datasets | Yes | We verify this by training a linear classifier on CIFAR-10... We consider image classification on CIFAR-10 and compare a two-layer MLP, a four-layer MLP, a simple 5-layer Conv Net, and a Res Net. ... In our experiments on CIFAR-10 and CIFAR-100, networks are trained using weight decay coefficients from their respective original papers. |
| Dataset Splits | No | The paper mentions 'training set' and 'test data' but does not specify any validation split or explicit percentages/counts for train/val/test splits, nor does it refer to a specific predefined split with citation. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) are provided for the experimental setup. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | Our experiments comparing regularizers all run for 300 epochs with an initial learning rate of 0.1 and decreases by a factor of 10 at epochs 100, 175, 225, and 275. We use the SGD optimizer with momentum 0.9. [...] When naturally training Res Net-18 and Skipless Res Net-18 models, we train with a batch size of 128 for 200 epochs with the learning rate initiated to 0.01 and decreasing by a factor of 10 at epochs 100, 150, 175, and 190 (for both CIFAR-10 and CIFAR-100). When adversarially training these two models on CIFAR-10 data, we use the same hyperparameters. [...] Adversarial training is done with an ℓ 7-step PGD attack with a step size of 2/255, and ϵ = 8/255. For all of the training described above we augment the data with random crops and horizontal flips. |