Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Authors: Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically demonstrate several attractive merits of applying the MMC loss. We experiment on the widely used MNIST, CIFAR-10, and CIFAR-100 datasets (Krizhevsky & Hinton, 2009; Le Cun et al., 1998).
Researcher Affiliation Collaboration Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, Tsinghua University; Real AI {pty17,xu-k16,dyp17,du-c14}@mails.tsinghua.edu.cn, {ningchen,dcszj}@tsinghua.edu.cn
Pseudocode Yes We give the generation algorithm for crafting the Max-Mahalanobis Centers in Algorithm 1, proposed by Pang et al. (2018).
Open Source Code Yes The codes are provided in https://github.com/P2333/Max-Mahalanobis-Training.
Open Datasets Yes We experiment on the widely used MNIST, CIFAR-10, and CIFAR-100 datasets (Krizhevsky & Hinton, 2009; Le Cun et al., 1998).
Dataset Splits No The paper uses standard datasets (MNIST, CIFAR-10, CIFAR-100) and mentions training epochs, but does not explicitly provide the train/validation/test dataset splits (e.g., percentages, sample counts, or citations to specific split methodologies) used to reproduce the experiments.
Hardware Specification Yes Most of our experiments are conducted on the NVIDIA DGX-1 server with eight Tesla P100 GPUs.
Software Dependencies No The paper mentions the use of the momentum SGD optimizer but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes For each training loss with or without the AT mechanism, we apply the momentum SGD (Qian, 1999) optimizer with the initial learning rate of 0.01, and train for 40 epochs on MNIST, 200 epochs on CIFAR-10 and CIFAR-100. The learning rate decays with a factor of 0.1 at 100 and 150 epochs, respectively. When applying the AT mechanism (Madry et al., 2018), the adversarial examples for training are crafted by 10-steps targeted or untargeted PGD with ϵ = 8/255. ... we choose the perturbation ϵ = 8/255 and 16/255, with the step size be 2/255.