Regularizing Deep Networks Using Efficient Layerwise Adversarial Training

Authors: Swami Sankaranarayanan, Arpit Jain, Rama Chellappa, Ser Nam Lim

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use these perturbations to train very deep models such as Res Nets and Wide Res Nets and show improvement in performance across datasets of different sizes such as CIFAR-10, CIFAR-100 and Image Net. Our ablative experiments show that the proposed approach not only provides stronger regularization compared to Dropout but also improves adversarial robustness comparable to traditional adversarial training approaches.
Researcher Affiliation Collaboration Swami Sankaranarayanan University of Maryland College Park, MD swamiviv@umiacs.umd.edu Arpit Jain GE Global Research Niskayuna, NY arpit.jain@ge.com Rama Chellappa University of Maryland College Park, MD rama@umiacs.umd.edu Ser Nam Lim GE Global Research Niskayuna, NY limser@ge.com
Pseudocode Yes Algorithm 1 Efficient layerwise adversarial training procedure for improved regularization
Open Source Code No The paper states: 'For the Res Net networks, we use the publicly available torch implementation (Res 2017). For the VGG architecture, we use a publicly available implementation which consists of Batch Normalization (VGG 2017). For Alex Net: We used the publicly available implementation from the torch platform (Ale 2017)'. These refer to implementations of base models they used, not the source code for their proposed methodology.
Open Datasets Yes We use these perturbations to train very deep models such as Res Nets and Wide Res Nets and show improvement in performance across datasets of different sizes such as CIFAR-10, CIFAR-100 and Image Net. Krizhevsky, A., and Hinton, G. 2009. Learning multiple layers of features from tiny images, https://www.cs.toronto.edu/ kriz/cifar.html.
Dataset Splits Yes Imagenet Experiment: To test the applicability of our regularization approach over a large scale dataset, we conducted an experiment using the Image Net dataset (train: 1.2M images, val: 50K images).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'torch implementation' for various models but does not specify any version numbers for Torch or other software dependencies.
Experiment Setup Yes For all the experiments, we use the SGD solver with Nesterov momentum of 0.9. The base learning rate is 0.1 and it is dropped by 5 every 60 epochs in case of CIFAR-100 and every 50 epochs in case of CIFAR-10. The total training duration is 300 epochs. We employ random flipping as a data augmentation procedure and standard mean/std preprocessing was applied conforming to the original implementations.