Memorization Through the Lens of Curvature of Loss Function Around Samples
Authors: Isha Garg, Deepak Ravikumar, Kaushik Roy
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this curvature metric effectively captures memorization statistics, both qualitatively and quantitatively in popular image datasets. We provide quantitative validation of the proposed metric against memorization scores released by Feldman & Zhang (2020). Further, experiments on mislabeled data detection show that corrupted samples are learned with high curvature and using curvature for identifying mislabelled examples outperforms existing approaches. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47906. |
| Pseudocode | Yes | I. Pseudo Code for Curvature Calculation Below we present the pseudo code for the proposed curvature calculation. |
| Open Source Code | Yes | Code available at github link. |
| Open Datasets | Yes | In this section, we present qualitative and quantitative results on MNIST (Deng, 2012), Fashion MNIST (Xiao et al., 2017), CIFAR10/100 (Krizhevsky et al., 2009), and Image Net (Russakovsky et al., 2015) datasets to support our claim that curvature can be used to measure memorization. |
| Dataset Splits | Yes | The training dataset consists of 15 points for each of the 2 classes, with a noise ratio of 0.3 introduced to 30% of the data. The test dataset consists of 100 points for each class with no noise. |
| Hardware Specification | Yes | Table 9: Memory and Run-Time when using GTX 1080Ti GPU with 11GB of VRAM and Intel Xeon with 187GB of system memory |
| Software Dependencies | No | We use Py Torch provided Res Net18 for Image Net models. No specific version numbers for software dependencies are provided beyond the general mention of PyTorch. |
| Experiment Setup | Yes | Details regarding the hyperparameters are provided in Appendix B. [...] We train for 300 epochs on CIFAR datasets, with a learning rate of 0.1, scaled by 0.1 on the 150th and 250th epoch. For MNIST and Fashion MNIST, we train for 200 epochs, with a learning rate of 0.1 scaled by 0.1 on the 80th and 160th epoch. For Image Net we train for 200 epochs with a learning rate of 0.1, scaled by 0.1 on the 120th and 160th epoch. Where weight decay is used, its value is set to 10 4. |