Memorization Through the Lens of Curvature of Loss Function Around Samples

Authors: Isha Garg, Deepak Ravikumar, Kaushik Roy

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this curvature metric effectively captures memorization statistics, both qualitatively and quantitatively in popular image datasets. We provide quantitative validation of the proposed metric against memorization scores released by Feldman & Zhang (2020). Further, experiments on mislabeled data detection show that corrupted samples are learned with high curvature and using curvature for identifying mislabelled examples outperforms existing approaches.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47906.
Pseudocode Yes I. Pseudo Code for Curvature Calculation Below we present the pseudo code for the proposed curvature calculation.
Open Source Code Yes Code available at github link.
Open Datasets Yes In this section, we present qualitative and quantitative results on MNIST (Deng, 2012), Fashion MNIST (Xiao et al., 2017), CIFAR10/100 (Krizhevsky et al., 2009), and Image Net (Russakovsky et al., 2015) datasets to support our claim that curvature can be used to measure memorization.
Dataset Splits Yes The training dataset consists of 15 points for each of the 2 classes, with a noise ratio of 0.3 introduced to 30% of the data. The test dataset consists of 100 points for each class with no noise.
Hardware Specification Yes Table 9: Memory and Run-Time when using GTX 1080Ti GPU with 11GB of VRAM and Intel Xeon with 187GB of system memory
Software Dependencies No We use Py Torch provided Res Net18 for Image Net models. No specific version numbers for software dependencies are provided beyond the general mention of PyTorch.
Experiment Setup Yes Details regarding the hyperparameters are provided in Appendix B. [...] We train for 300 epochs on CIFAR datasets, with a learning rate of 0.1, scaled by 0.1 on the 150th and 250th epoch. For MNIST and Fashion MNIST, we train for 200 epochs, with a learning rate of 0.1 scaled by 0.1 on the 80th and 160th epoch. For Image Net we train for 200 epochs with a learning rate of 0.1, scaled by 0.1 on the 120th and 160th epoch. Where weight decay is used, its value is set to 10 4.