Concurrent Adversarial Learning for Large-Batch Training
Authors: Yong Liu, Xiangning Chen, Minhao Cheng, Cho-Jui Hsieh, Yang You
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Con Adv can successfully increase the batch size of Res Net-50 training on Image Net while maintaining high accuracy. This is the first work that successfully scales the Res Net-50 training batch size to 96K. |
| Researcher Affiliation | Academia | 1Department of Computer Science, National University of Singapore 2Department of Computer Science, University of California, Los Angeles 3Department of Computer Science and Engineering, Hong Kong University of Science and Technology |
| Pseudocode | Yes | Algorithm 1 Con Adv for t = 1, , T do for xi Bk c,t do Compute Loss: L(θt; xi, yi) using main BN, Lk a(θt; ˆxi(θt τ), yi) using adv BN, LB(θt) = EBk c,t L(θt; xi, yi)+ EBk a,t(ˆxi(θt τ), yi) Minimize the LB(θt) and obtain gk t (θt) end for for xi Bk c,t+τ do Calculate adv gradient gk a(θt)on Bk c,t+τ Obtain adv examples (ˆxi(θt), yi) end for end for Aggregate: ˆgt(θt) = 1 K PK k=1 ˆgk t (θt) Update weight θt+1 on parameter sever |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the described methodology. |
| Open Datasets | Yes | The dataset we used in this paper is Image Net-1k, which consists of 1.28 million images for training and 50k images for testing. |
| Dataset Splits | No | The paper specifies 1.28 million images for training and 50k images for testing, but does not explicitly mention a separate validation set or its size. |
| Hardware Specification | Yes | We use TPU-v3 for all our experiments and the same setting as the baseline... we use a large enough distributed system to train the model with the batch size of 512, 1k and 2k on TPU v3-128, TPU v3-256 and TPU v3-512 , respectively. |
| Software Dependencies | No | The paper mentions optimizers like LARS and ADAM, and data augmentation methods like Auto Aug, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Implementation Details. We use TPU-v3 for all our experiments and the same setting as the baseline. We consider 90-epoch training for Res Net-50. For data augmentation, we mainly consider Auto Aug (AA). In addition, we use LARS (You et al., 2017) to train all the models. Finally, for adversarial training, we always use 1-step PGD attack with random initialization... Appendix A.4 HYPERPARAMETERS: More specially, our main hyperparameters are shown in Table 6. Table 6: Hyperparameters of Res Net-50 on Image Net. Includes: Peak LR, Epoch, Weight Decay, Warmup, LR decay, Optimizer, Momentum, Label Smoothing. |