reproducibilityindex.ai

Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Authors: Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically demonstrate several attractive merits of applying the MMC loss. We experiment on the widely used MNIST, CIFAR-10, and CIFAR-100 datasets (Krizhevsky & Hinton, 2009; Le Cun et al., 1998).
Researcher Affiliation	Collaboration	Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, Tsinghua University; Real AI {pty17,xu-k16,dyp17,du-c14}@mails.tsinghua.edu.cn, {ningchen,dcszj}@tsinghua.edu.cn
Pseudocode	Yes	We give the generation algorithm for crafting the Max-Mahalanobis Centers in Algorithm 1, proposed by Pang et al. (2018).
Open Source Code	Yes	The codes are provided in https://github.com/P2333/Max-Mahalanobis-Training.
Open Datasets	Yes	We experiment on the widely used MNIST, CIFAR-10, and CIFAR-100 datasets (Krizhevsky & Hinton, 2009; Le Cun et al., 1998).
Dataset Splits	No	The paper uses standard datasets (MNIST, CIFAR-10, CIFAR-100) and mentions training epochs, but does not explicitly provide the train/validation/test dataset splits (e.g., percentages, sample counts, or citations to specific split methodologies) used to reproduce the experiments.
Hardware Specification	Yes	Most of our experiments are conducted on the NVIDIA DGX-1 server with eight Tesla P100 GPUs.
Software Dependencies	No	The paper mentions the use of the momentum SGD optimizer but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	For each training loss with or without the AT mechanism, we apply the momentum SGD (Qian, 1999) optimizer with the initial learning rate of 0.01, and train for 40 epochs on MNIST, 200 epochs on CIFAR-10 and CIFAR-100. The learning rate decays with a factor of 0.1 at 100 and 150 epochs, respectively. When applying the AT mechanism (Madry et al., 2018), the adversarial examples for training are crafted by 10-steps targeted or untargeted PGD with ϵ = 8/255. ... we choose the perturbation ϵ = 8/255 and 16/255, with the step size be 2/255.