Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Understanding Robust Overfitting of Adversarial Training and Beyond

Authors: Chaojian Yu, Bo Han, Li Shen, Jun Yu, Chen Gong, Mingming Gong, Tongliang Liu

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that they not only eliminate robust overfitting, but also further boost adversarial robustness.
Researcher Affiliation	Collaboration	1TML Lab, Sydney AI Centre, The University of Sydney, Sydney, Australia 2Department of Computer Science, Hong Kong Baptist University, Hong Kong, China 3JD Explore Academy, Beijing, China 4Department of Automation, University of Science and Technology of China, Hefei, China 5School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China 6School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia.
Pseudocode	Yes	Algorithm 1 MLCAT-prototype (in a mini-batch).
Open Source Code	Yes	Our implementation is based on PyTorch and the code is publicly available1. 1https://github.com/Chaojian Yu/ Understanding-Robust-Overfitting
Open Datasets	Yes	We conduct extensive experiments across three benchmark datasets (CIFAR10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and CIFAR100 (Krizhevsky et al., 2009))
Dataset Splits	No	The paper mentions training data and test data but does not explicitly provide details about a validation dataset split or its purpose for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper states 'Our implementation is based on PyTorch' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	For training, the model is trained for 200 epochs using SGD with momentum 0.9, weight decay 5 10 4, and an initial learning rate of 0.1. The learning rate is divided by 10 at the 100-th and 150-th epoch, respectively. Standard data augmentation including random crops with 4 pixels of padding and random horizontal flips are applied for CIFAR10 and CIFAR100, and no data augmentation is used on SVHN. For adversary, 10-step PGD attack is applied: for L threat model, perturbation size = 8/255, step size = 1/255 for SVHN, and = 2/255 for both CIFAR10 and CIFAR100; for L2 threat model, perturbation size = 128/255, step size = 15/255 for all datasets, which is a standard setting for PGD-based adversarial training (Madry et al., 2017). For testing, model robustness is evaluated by measuring the accuracy on test data under different adversarial attacks, including 20-step PGD (PGD-20) (Madry et al., 2017) and Auto Attack (AA) (Croce & Hein, 2020b).