Truth or backpropaganda? An empirical investigation of deep learning theory

Authors: Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental TRUTH OR BACKPROPAGANDA? AN EMPIRICAL INVESTIGATION OF DEEP LEARNING THEORY We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike.
Researcher Affiliation Academia Micah Goldblum Department of Mathematics University of Maryland goldblum@umd.edu Jonas Geiping Department of Computer Science and Electrical Engineering University of Siegen jonas.geiping@uni-siegen.de Avi Schwarzschild Department of Mathematics University of Maryland avi1@umd.edu Michael Moeller Department of Computer Science and Electrical Engineering University of Siegen michael.moeller@uni-siegen.de Tom Goldstein Department of Computer Science University of Maryland tomg@umd.edu
Pseudocode No No pseudocode or algorithm blocks are explicitly labeled or structured like code.
Open Source Code No The paper does not contain any statements about making source code publicly available.
Open Datasets Yes We verify this by training a linear classifier on CIFAR-10... We consider image classification on CIFAR-10 and compare a two-layer MLP, a four-layer MLP, a simple 5-layer Conv Net, and a Res Net. ... In our experiments on CIFAR-10 and CIFAR-100, networks are trained using weight decay coefficients from their respective original papers.
Dataset Splits No The paper mentions 'training set' and 'test data' but does not specify any validation split or explicit percentages/counts for train/val/test splits, nor does it refer to a specific predefined split with citation.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory) are provided for the experimental setup.
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes Our experiments comparing regularizers all run for 300 epochs with an initial learning rate of 0.1 and decreases by a factor of 10 at epochs 100, 175, 225, and 275. We use the SGD optimizer with momentum 0.9. [...] When naturally training Res Net-18 and Skipless Res Net-18 models, we train with a batch size of 128 for 200 epochs with the learning rate initiated to 0.01 and decreasing by a factor of 10 at epochs 100, 150, 175, and 190 (for both CIFAR-10 and CIFAR-100). When adversarially training these two models on CIFAR-10 data, we use the same hyperparameters. [...] Adversarial training is done with an ℓ 7-step PGD attack with a step size of 2/255, and ϵ = 8/255. For all of the training described above we augment the data with random crops and horizontal flips.