On the Over-Memorization During Natural, Robust and Catastrophic Overfitting
Authors: Runqi Lin, Chaojian Yu, Bo Han, Tongliang Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of our proposed method in mitigating overfitting across various training paradigms. |
| Researcher Affiliation | Academia | Runqi Lin Sydney AI Centre, The University of Sydney rlin0511@sydney.edu.au Chaojian Yu Sydney AI Centre, The University of Sydney chyu8051@sydney.edu.au Bo Han Hong Kong Baptist University bhanml@comp.hkbu.edu.hk Tongliang Liu Sydney AI Centre, The University of Sydney tongliang.liu@sydney.edu.au |
| Pseudocode | Yes | Algorithm 1: Distraction Over-Memorization (DOM) |
| Open Source Code | Yes | Our implementation can be found at https://github.com/tmllab/2024_ICLR_DOM. |
| Open Datasets | Yes | We conducted extensive experiments on the benchmark datasets Cifar-10/100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and Tiny-Image Net (Netzer et al., 2011). |
| Dataset Splits | No | The paper does not specify exact dataset split percentages or sample counts for training, validation, and test sets. It implies standard splits by using benchmark datasets but does not explicitly state them or cite predefined splits. |
| Hardware Specification | Yes | The training cost (epoch/second) on CIFAR10 using Preact Res Net-18 with a single NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using the SGD optimizer with specific momentum and weight decay values, but it does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or the Python interpreter. |
| Experiment Setup | Yes | Other hyperparameters setting, including learning rate schedule, training epochs E, warm-up epoch K, loss threshold T , data augmentation strength β and data augmentation iteration γ are summarized in Table 1. We train the Preact Res Net-18 (He et al., 2016), Wide Res Net-34 (Zagoruyko & Komodakis, 2016) and Vi T-small (Dosovitskiy et al., 2020) architectures on these datasets by utilizing the SGD optimizer with a momentum of 0.9 and weight decay of 5e-4. |