reproducibilityindex.ai

Robustness and Accuracy Could Be Reconcilable by (Proper) Definition

Authors: Tianyu Pang, Min Lin, Xiao Yang, Jun Zhu, Shuicheng Yan

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Sec. 5, we validate the effectiveness of replacing KL divergence with distance-based metrics (and their variants), developed from the analyses of SCORE. We improve the state-of-the-art AT methods under Auto Attack (Croce and Hein, 2020), and achieve top-rank performance with 1M DDPM generated data on the leader boards of CIFAR-10 and CIFAR-100 on Robust Bench (Croce et al., 2020).
Researcher Affiliation	Collaboration	1Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint Center for ML, Tsinghua University. 2Sea AI Lab, Singapore.
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	Yes	Code is at https://github.com/P2333/SCORE.
Open Datasets	Yes	We improve the state-of-the-art AT methods under Auto Attack (Croce and Hein, 2020), and achieve top-rank performance with 1M DDPM generated data on the leader boards of CIFAR-10 and CIFAR-100 on Robust Bench (Croce et al., 2020).
Dataset Splits	Yes	For our methods, we report the results on the checkpoint with the highest value of PGD-10 (SE) accuracy on a separate validation set, similarly to Rice et al. (2020).
Hardware Specification	No	The paper mentions using 'large models' and notes 'limited computational resources' but does not provide specific hardware details such as GPU or CPU models used for experiments.
Software Dependencies	No	The paper mentions 'Py Torch implementation' and 'SGD momentum optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	In training, we use SGD momentum optimizer with batch size 128 and weight decay 5e-4. We exploit the PGD-AT (Madry et al., 2018) and TRADES (Zhang et al., 2019) frameworks. The training attack used is 10-steps PGD with step size α = 2/255 for ℓ∞ threat model and α = 16/255 for ℓ2 threat model. The training runs for 110 epochs with the learning rate decaying by a factor of 0.1 at the 100 and 105 epoch, respectively. The hyperparameter β = 6 in the TRADES experiments.