Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness
Authors: Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically demonstrate several attractive merits of applying the MMC loss. We experiment on the widely used MNIST, CIFAR-10, and CIFAR-100 datasets (Krizhevsky & Hinton, 2009; Le Cun et al., 1998). |
| Researcher Affiliation | Collaboration | Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, Tsinghua University; Real AI {pty17,xu-k16,dyp17,du-c14}@mails.tsinghua.edu.cn, {ningchen,dcszj}@tsinghua.edu.cn |
| Pseudocode | Yes | We give the generation algorithm for crafting the Max-Mahalanobis Centers in Algorithm 1, proposed by Pang et al. (2018). |
| Open Source Code | Yes | The codes are provided in https://github.com/P2333/Max-Mahalanobis-Training. |
| Open Datasets | Yes | We experiment on the widely used MNIST, CIFAR-10, and CIFAR-100 datasets (Krizhevsky & Hinton, 2009; Le Cun et al., 1998). |
| Dataset Splits | No | The paper uses standard datasets (MNIST, CIFAR-10, CIFAR-100) and mentions training epochs, but does not explicitly provide the train/validation/test dataset splits (e.g., percentages, sample counts, or citations to specific split methodologies) used to reproduce the experiments. |
| Hardware Specification | Yes | Most of our experiments are conducted on the NVIDIA DGX-1 server with eight Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions the use of the momentum SGD optimizer but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | For each training loss with or without the AT mechanism, we apply the momentum SGD (Qian, 1999) optimizer with the initial learning rate of 0.01, and train for 40 epochs on MNIST, 200 epochs on CIFAR-10 and CIFAR-100. The learning rate decays with a factor of 0.1 at 100 and 150 epochs, respectively. When applying the AT mechanism (Madry et al., 2018), the adversarial examples for training are crafted by 10-steps targeted or untargeted PGD with ϵ = 8/255. ... we choose the perturbation ϵ = 8/255 and 16/255, with the step size be 2/255. |