Leveraging Unlabeled Data to Track Memorization
Authors: Mahsa Forouzesh, Hanie Sedghi, Patrick Thiran
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric. Finally, we show through extensive experiments on datasets with synthetic and real-world label noise that one can utilize susceptibility and the overall training accuracy to distinguish models that maintain a low memorization on the training set and generalize well to unseen clean data. |
| Researcher Affiliation | Collaboration | Mahsa Forouzesh1, Hanie Sedghi2, Patrick Thiran1 1École Polytechnique Fédérale de Lausanne 2Google Research, Brain Team |
| Pseudocode | Yes | Algorithm 1 Computes the susceptibility to noisy labels ζ |
| Open Source Code | Yes | The source code to reproduce our results is available in the supplementary material. |
| Open Datasets | Yes | All the datasets and architectures used in this paper are publicly available and are properly cited. Reproducibility of all our experiments is ensured by providing the experimental setup details in Appendix B. The data processing steps are provided both in Appendix B and in the source code. |
| Dataset Splits | Yes | We modify original datasets similarly to Chatterjee (2020); for a fraction of samples denoted by the label noise level (LNL), we replace the labels with independent random variables drawn uniformly from {1, , c} for a dataset with c number of classes. |
| Hardware Specification | Yes | Each of our experiments take few hours to run on a single Nvidia Titan X Maxwell GPU. |
| Software Dependencies | No | The paper mentions using SGD and Adam optimizers, but does not specify their version numbers or the versions of any other software libraries (e.g., Python, PyTorch/TensorFlow). |
| Experiment Setup | Yes | The models are trained for 200 epochs on the cross-entropy objective function using SGD with weight decay 5 × 10−4 and batch size 128. The neural network architecture options are: cnn (a simple 5-layer convolutional neural network), Dense Net (Huang et al., 2017), Efficient Net (Tan & Le, 2019) (with scale=0.5, 0.75, 1, 1.25, 1.5), Goog Le Net (Szegedy et al., 2015), Mobile Net (Howard et al., 2017) (with scale=0.5, 0.75, 1, 1.25, 1.5), Res Net (He et al., 2016a), Mobile Net V2 (Sandler et al., 2018) (with scale=0.5, 0.75, 1, 1.25, 1.5), Preact Res Net (He et al., 2016b), Reg Net (Radosavovic et al., 2020), Res Ne Xt (Xie et al., 2017), SENet (Hu et al., 2018), Shuffle Net V2 (Ma et al., 2018) (with scale=0.5, 1, 1.5, 2), DLA (Yu et al., 2018), and VGG (Simonyan & Zisserman, 2014). The learning rate value options are: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5. The learning rate schedule options are: cosineannealing with Tmax 200, cosineannealing with Tmax 100, cosineannealing with Tmax 50, and no learning rate schedule. Momentum value options are 0.9, 0. |