Fast AdvProp
Authors: Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results show that, compared to the vanilla training baseline, Fast Adv Prop is able to further model performance on a spectrum of visual benchmarks, without incurring extra training cost. Additionally, our ablations find Fast Adv Prop scales better if larger models are used, is compatible with existing data augmentation methods (i.e., Mixup and Cut Mix), and can be easily adapted to other recognition tasks like object detection. |
| Researcher Affiliation | Academia | Jieru Mei1, Yucheng Han2, Yutong Bai1, Yixiao Zhang1, Yingwei Li1, Xianhang Li3 Alan Yuille1 & Cihang Xie3 1Johns Hopkins University 2Nanyang Technological University 3UC Santa Cruz |
| Pseudocode | Yes | Algorithm 1: Pseudo code of Fast Adv Prop for T epochs, given some radius ϵ, importance re-weight parameter β, learning rate γ, and ratio of adversarial examples padv. |
| Open Source Code | Yes | The code is available here: https://github.com/meijieru/fast_advprop. |
| Open Datasets | Yes | We evaluate model performance on Image Net classification, and the robustness on different specialized benchmarks including Image Net-C, Image Net-R, and Stylized Image Net. Image Net dataset contains 1.2 million training images and 50000 images for validation of 1000 classes. We implement Fast Adv Prop on object detection and evaluate it on COCO dataset (Lin et al., 2014). |
| Dataset Splits | Yes | Image Net dataset contains 1.2 million training images and 50000 images for validation of 1000 classes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'a SGD optimizer' and refers to 'Res Net family' architectures, but does not specify any software dependencies (e.g., PyTorch, TensorFlow, or specific library versions) with version numbers. |
| Experiment Setup | Yes | We use the renowned Res Net family (He et al., 2016) as our default architectures. We use a SGD optimizer with momentum 0.9 and train for 105 epochs. The learning rate starts from 0.1 and decays at 30, 60, 90, 100 epochs by 0.1. We use a batch size of 64 per GPU for vanilla training. For decoupled training setting, we use a batch size of 64/(1 padv) per GPU, keeping the same 64 batch size per GPU for the original BNs. padv is set to 0.2 if not specified. ... To generate adversarial images, we using the PGD attacker with random initialization. We attack for one step (K = 1) and set the perturbation size to 1.0. ... we set β = 0.5 for halving the importance of the images with random noise and the adversarial images. Additionally, we re-scale the gradient to achieve the 1 : 1 : 1 ratio for ensuring similar updating speed of all parameters. |