reproducibilityindex.ai

On the Over-Memorization During Natural, Robust and Catastrophic Overfitting

Authors: Runqi Lin, Chaojian Yu, Bo Han, Tongliang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our proposed method in mitigating overfitting across various training paradigms.
Researcher Affiliation	Academia	Runqi Lin Sydney AI Centre, The University of Sydney rlin0511@sydney.edu.au Chaojian Yu Sydney AI Centre, The University of Sydney chyu8051@sydney.edu.au Bo Han Hong Kong Baptist University bhanml@comp.hkbu.edu.hk Tongliang Liu Sydney AI Centre, The University of Sydney tongliang.liu@sydney.edu.au
Pseudocode	Yes	Algorithm 1: Distraction Over-Memorization (DOM)
Open Source Code	Yes	Our implementation can be found at https://github.com/tmllab/2024_ICLR_DOM.
Open Datasets	Yes	We conducted extensive experiments on the benchmark datasets Cifar-10/100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and Tiny-Image Net (Netzer et al., 2011).
Dataset Splits	No	The paper does not specify exact dataset split percentages or sample counts for training, validation, and test sets. It implies standard splits by using benchmark datasets but does not explicitly state them or cite predefined splits.
Hardware Specification	Yes	The training cost (epoch/second) on CIFAR10 using Preact Res Net-18 with a single NVIDIA RTX 4090 GPU.
Software Dependencies	No	The paper mentions using the SGD optimizer with specific momentum and weight decay values, but it does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or the Python interpreter.
Experiment Setup	Yes	Other hyperparameters setting, including learning rate schedule, training epochs E, warm-up epoch K, loss threshold T , data augmentation strength β and data augmentation iteration γ are summarized in Table 1. We train the Preact Res Net-18 (He et al., 2016), Wide Res Net-34 (Zagoruyko & Komodakis, 2016) and Vi T-small (Dosovitskiy et al., 2020) architectures on these datasets by utilizing the SGD optimizer with a momentum of 0.9 and weight decay of 5e-4.