reproducibilityindex.ai

Can Neural Network Memorization Be Localized?

Authors: Pratyush Maini, Michael Curtis Mozer, Hanie Sedghi, Zachary Chase Lipton, J Zico Kolter, Chiyuan Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, via three experimental sources of converging evidence, we find that most layers are redundant for the memorization of examples and the layers that contribute to example memorization are, in general, not the final layers. The three sources are gradient accounting (measuring the contribution to the gradient norms from memorized and clean examples), layer rewinding (replacing specific model weights of a converged model with previous training checkpoints), and retraining (training rewound layers only on clean examples).
Researcher Affiliation	Collaboration	1School of Computer Science, Carnegie Mellon University 2Google Research, Mountain View, CA.
Pseudocode	No	The paper describes the steps of its methods in narrative text and includes mathematical formulations, but it does not contain any clearly labeled pseudocode or algorithm blocks/figures.
Open Source Code	Yes	Code for reproducing our experiments can be found at https://github.com/pratyushmaini/localizing-memorization.
Open Datasets	Yes	We perform experiments on three image classification datasets, CIFAR-10 (Krizhevsky, 2009), MNIST (Deng, 2012), and SVHN (Netzer et al., 2011).
Dataset Splits	No	The paper mentions training models and evaluating them on clean and noisy data subsets and also on a 'Test' set, but it does not explicitly provide details about a validation split, nor specific percentages or counts for training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions the use of 'Py Torch implementation' when discussing layer grouping, but it does not specify version numbers for PyTorch or any other software dependencies required to replicate the experiments.
Experiment Setup	Yes	Training Parameters. We use the one-cycle learning rate (Smith, 2017) and train our models for 50 epochs using SGD optimizer. The peak learning rate for the cyclic scheduler is set to 0.1 at the 10th epoch, and the training batch size is 512. Unless specified, we add 10% random label noise to the dataset: that is, we flip the label of 10% examples to an incorrect class chosen at random.