Concurrent Adversarial Learning for Large-Batch Training

Authors: Yong Liu, Xiangning Chen, Minhao Cheng, Cho-Jui Hsieh, Yang You

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Con Adv can successfully increase the batch size of Res Net-50 training on Image Net while maintaining high accuracy. This is the first work that successfully scales the Res Net-50 training batch size to 96K.
Researcher Affiliation Academia 1Department of Computer Science, National University of Singapore 2Department of Computer Science, University of California, Los Angeles 3Department of Computer Science and Engineering, Hong Kong University of Science and Technology
Pseudocode Yes Algorithm 1 Con Adv for t = 1, , T do for xi Bk c,t do Compute Loss: L(θt; xi, yi) using main BN, Lk a(θt; ˆxi(θt τ), yi) using adv BN, LB(θt) = EBk c,t L(θt; xi, yi)+ EBk a,t(ˆxi(θt τ), yi) Minimize the LB(θt) and obtain gk t (θt) end for for xi Bk c,t+τ do Calculate adv gradient gk a(θt)on Bk c,t+τ Obtain adv examples (ˆxi(θt), yi) end for end for Aggregate: ˆgt(θt) = 1 K PK k=1 ˆgk t (θt) Update weight θt+1 on parameter sever
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the described methodology.
Open Datasets Yes The dataset we used in this paper is Image Net-1k, which consists of 1.28 million images for training and 50k images for testing.
Dataset Splits No The paper specifies 1.28 million images for training and 50k images for testing, but does not explicitly mention a separate validation set or its size.
Hardware Specification Yes We use TPU-v3 for all our experiments and the same setting as the baseline... we use a large enough distributed system to train the model with the batch size of 512, 1k and 2k on TPU v3-128, TPU v3-256 and TPU v3-512 , respectively.
Software Dependencies No The paper mentions optimizers like LARS and ADAM, and data augmentation methods like Auto Aug, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Implementation Details. We use TPU-v3 for all our experiments and the same setting as the baseline. We consider 90-epoch training for Res Net-50. For data augmentation, we mainly consider Auto Aug (AA). In addition, we use LARS (You et al., 2017) to train all the models. Finally, for adversarial training, we always use 1-step PGD attack with random initialization... Appendix A.4 HYPERPARAMETERS: More specially, our main hyperparameters are shown in Table 6. Table 6: Hyperparameters of Res Net-50 on Image Net. Includes: Peak LR, Epoch, Weight Decay, Warmup, LR decay, Optimizer, Momentum, Label Smoothing.