Enhancing Adversarial Defense by k-Winners-Take-All

Authors: Chang Xiao, Peilin Zhong, Changxi Zheng

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks. We conducted extensive experiments on multiple datasets under different network architectures, including Res Net (He et al., 2016), Dense Net (Huang et al., 2017), and Wide Res Net (Zagoruyko & Komodakis, 2016), that are optimized by regular training as well as various adversarial training methods (Madry et al., 2017; Zhang et al., 2019; Shafahi et al., 2019b).
Researcher Affiliation Academia Chang Xiao Peilin Zhong Changxi Zheng Columbia University {chang, peilin, cxz}@cs.columbia.edu
Pseudocode No The paper describes the k-WTA activation function and its training procedure in narrative text and mathematical formulas (Equation 1), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes To promote reproducible research, we will release our implementation of k-WTA networks, along with our experiment code, configuration files and pre-trained models1. 1https://github.com/a554b554/k WTA-Activation
Open Datasets Yes We conducted extensive experiments on multiple datasets under different network architectures, including Res Net (He et al., 2016), Dense Net (Huang etm al., 2017), and Wide Res Net (Zagoruyko & Komodakis, 2016), that are optimized by regular training as well as various adversarial training methods (Madry et al., 2017; Zhang et al., 2019; Shafahi et al., 2019b). In each setup, we compare the robust accuracy of k-WTA networks with standard Re LU networks on three datasets, CIFAR-10, SVHN, and MNIST.
Dataset Splits No The paper uses CIFAR-10 and SVHN datasets which have standard train/test splits, but it does not explicitly provide specific details for a separate validation split, such as percentages, sample counts, or a citation for a predefined validation split.
Hardware Specification No The paper states "All experiments are conducted using Py Torch framework." but does not provide any specific hardware details such as GPU or CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies No The paper mentions "Py Torch framework" and "Foolbox (Rauber et al., 2017)" but does not specify version numbers for these software dependencies.
Experiment Setup Yes All the Re LU networks are trained with stochastic gradient descent (SGD) method with momentum=0.9. We use a learning rate 0.1 from the first to 50-th epoch and 0.01 from 50-th to 80-th epoch. All networks are trained with a batch size of 256. For PGD attack, we use 40 iterations with random start, the step size is 0.003.