Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

Authors: Runqi Lin, Chaojian Yu, Bo Han, Hang Su, Tongliang Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
Researcher Affiliation Academia 1Sydeny AI Centre, School of Computer Science, The University of Sydney, Sydney, Australia 2Department of Computer Science, Hong Kong Baptist University, Hong Kong, China 3Department of Computer Science and Technology, Institute for AI, BNRist Center, Tsinghua University, Beijing, China.
Pseudocode Yes The LAP algorithm is summarized in Algorithm 1.
Open Source Code Yes Our implementation can be found at https:// github.com/tmllab/2024_ICML_LAP.
Open Datasets Yes We use three benchmark datasets, CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Tiny-Image Net (Netzer et al., 2011), for evaluating the performances of our proposed method.
Dataset Splits No The paper mentions using datasets like CIFAR-10, CIFAR-100, and Tiny-Image Net, and refers to 'test accuracy' in tables and figures. However, it does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) in the text.
Hardware Specification Yes The results are obtained on a single NVIDIA RTX 4090 GPU and averaged over 30 training epochs.
Software Dependencies No The paper mentions using the SGD optimizer and adhering to configurations of official repositories for baselines but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use the cyclical learning rate schedule (Smith, 2017) spanning 30 epochs, which reaches its maximum learning rate of 0.2 at the 15th epoch... We employ the SGD optimizer with a momentum of 0.9, a weight decay of 5 10 4, the L -norm for input perturbation, and the L2-norm for weight perturbation. ...we set the γ as 0.3, and the detailed setting for β can be found in Table 3.