A Closer Look at Memorization in Deep Networks
Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. [...] We perform experiments on MNIST (Le Cun et al., 1998) and CIFAR10 (Krizhevsky et al.) datasets. |
| Researcher Affiliation | Academia | 1Montréal Institute for Learning Algorithms, Canada 2Université de Montréal, Canada 3Jagiellonian University, Krakow, Poland 4Mc Gill University, Canada 5University of California, Berkeley, USA 6Polytechnique Montréal, Canada 7University of Bonn, Bonn, Germany 8CIFAR Fellow 9CIFAR Senior Fellow. |
| Pseudocode | Yes | Algorithm 1 Langevin Adversarial Sample Search (LASS) |
| Open Source Code | No | The paper mentions the use of third-party libraries like Theano and Keras ('Experiments were carried out using Theano (Theano Development Team, 2016) and Keras (Chollet et al., 2015)'). However, it does not contain any explicit statement about releasing the source code for the methodology described in the paper, nor does it provide a link to a code repository for their work. |
| Open Datasets | Yes | We perform experiments on MNIST (Le Cun et al., 1998) and CIFAR10 (Krizhevsky et al.) datasets. |
| Dataset Splits | No | The paper mentions the use of validation sets and discusses 'validation accuracy' (e.g., 'We also measure the number of critical samples in the validation set,' and 'the network achieves maximum accuracy on the validation set'). However, it does not explicitly provide the specific percentages or sample counts for the training, validation, and test splits, nor does it refer to predefined splits with citations for these proportions, beyond mentioning the datasets themselves like MNIST and CIFAR10. |
| Hardware Specification | No | The paper acknowledges 'the computing resources provided by Compute Canada and Calcul Quebec,' but it does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or other detailed system specifications. |
| Software Dependencies | No | The paper states, 'Experiments were carried out using Theano (Theano Development Team, 2016) and Keras (Chollet et al., 2015).' While it mentions the software and their respective main publication years, it does not provide specific version numbers for these libraries (e.g., Keras 2.2.4 or Theano 1.0.0) which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | If not stated otherwise, the MLPs have 4096 hidden units per layer and are trained for 1000 epochs with SGD and learning rate 0.01. The CNNs are a small Alexnet-style CNN2 (as in Zhang et al. (2017)), and are trained using SGD with momentum=0.9 and learning rate of 0.01, scheduled to drop by half every 15 epochs. |