A Closer Look at Memorization in Deep Networks

Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. [...] We perform experiments on MNIST (Le Cun et al., 1998) and CIFAR10 (Krizhevsky et al.) datasets.
Researcher Affiliation Academia 1Montréal Institute for Learning Algorithms, Canada 2Université de Montréal, Canada 3Jagiellonian University, Krakow, Poland 4Mc Gill University, Canada 5University of California, Berkeley, USA 6Polytechnique Montréal, Canada 7University of Bonn, Bonn, Germany 8CIFAR Fellow 9CIFAR Senior Fellow.
Pseudocode Yes Algorithm 1 Langevin Adversarial Sample Search (LASS)
Open Source Code No The paper mentions the use of third-party libraries like Theano and Keras ('Experiments were carried out using Theano (Theano Development Team, 2016) and Keras (Chollet et al., 2015)'). However, it does not contain any explicit statement about releasing the source code for the methodology described in the paper, nor does it provide a link to a code repository for their work.
Open Datasets Yes We perform experiments on MNIST (Le Cun et al., 1998) and CIFAR10 (Krizhevsky et al.) datasets.
Dataset Splits No The paper mentions the use of validation sets and discusses 'validation accuracy' (e.g., 'We also measure the number of critical samples in the validation set,' and 'the network achieves maximum accuracy on the validation set'). However, it does not explicitly provide the specific percentages or sample counts for the training, validation, and test splits, nor does it refer to predefined splits with citations for these proportions, beyond mentioning the datasets themselves like MNIST and CIFAR10.
Hardware Specification No The paper acknowledges 'the computing resources provided by Compute Canada and Calcul Quebec,' but it does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or other detailed system specifications.
Software Dependencies No The paper states, 'Experiments were carried out using Theano (Theano Development Team, 2016) and Keras (Chollet et al., 2015).' While it mentions the software and their respective main publication years, it does not provide specific version numbers for these libraries (e.g., Keras 2.2.4 or Theano 1.0.0) which are necessary for reproducible software dependencies.
Experiment Setup Yes If not stated otherwise, the MLPs have 4096 hidden units per layer and are trained for 1000 epochs with SGD and learning rate 0.01. The CNNs are a small Alexnet-style CNN2 (as in Zhang et al. (2017)), and are trained using SGD with momentum=0.9 and learning rate of 0.01, scheduled to drop by half every 15 epochs.