On the Over-Memorization During Natural, Robust and Catastrophic Overfitting

Authors: Runqi Lin, Chaojian Yu, Bo Han, Tongliang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of our proposed method in mitigating overfitting across various training paradigms.
Researcher Affiliation Academia Runqi Lin Sydney AI Centre, The University of Sydney rlin0511@sydney.edu.au Chaojian Yu Sydney AI Centre, The University of Sydney chyu8051@sydney.edu.au Bo Han Hong Kong Baptist University bhanml@comp.hkbu.edu.hk Tongliang Liu Sydney AI Centre, The University of Sydney tongliang.liu@sydney.edu.au
Pseudocode Yes Algorithm 1: Distraction Over-Memorization (DOM)
Open Source Code Yes Our implementation can be found at https://github.com/tmllab/2024_ICLR_DOM.
Open Datasets Yes We conducted extensive experiments on the benchmark datasets Cifar-10/100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and Tiny-Image Net (Netzer et al., 2011).
Dataset Splits No The paper does not specify exact dataset split percentages or sample counts for training, validation, and test sets. It implies standard splits by using benchmark datasets but does not explicitly state them or cite predefined splits.
Hardware Specification Yes The training cost (epoch/second) on CIFAR10 using Preact Res Net-18 with a single NVIDIA RTX 4090 GPU.
Software Dependencies No The paper mentions using the SGD optimizer with specific momentum and weight decay values, but it does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or the Python interpreter.
Experiment Setup Yes Other hyperparameters setting, including learning rate schedule, training epochs E, warm-up epoch K, loss threshold T , data augmentation strength β and data augmentation iteration γ are summarized in Table 1. We train the Preact Res Net-18 (He et al., 2016), Wide Res Net-34 (Zagoruyko & Komodakis, 2016) and Vi T-small (Dosovitskiy et al., 2020) architectures on these datasets by utilizing the SGD optimizer with a momentum of 0.9 and weight decay of 5e-4.