Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Authors: Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted on both natural language understanding tasks with token-level data augmentation, and image classification tasks with commonly-used image augmentation techniques like random crop and horizontal flip. Empirical results show that the proposed method improves the generalization performance of the model.
Researcher Affiliation Collaboration Mingyang Yi1,2 , Lu Hou3, Lifeng Shang3, Xin Jiang3, Qun Liu3, Zhi-Ming Ma1,2 1University of Chinese Academy of Sciences yimingyang17@mails.ucas.edu.cn 2Academy of Mathematics and Systems Science, Chinese Academy of Sciences mazm@amt.ac.cn 3Huawei Noah s Ark Lab {houlu3,shang.lifeng,Jiang.Xin,qun.liu}@huawei.com
Pseudocode Yes Algorithm 1 Minimize the Maximal Expected Loss (MMEL). Algorithm 2 Augmented Sample Generation by Greedy Search.
Open Source Code No The paper does not contain any statements about making source code publicly available or provide links to a code repository.
Open Datasets Yes Data. CIFAR (Krizhevsky et al., 2014) is a benchmark dataset for image classification. We use both CIFAR-10 and CIFAR-100 in our experiments... Data. Image Net(Deng et al., 2009)... Data. GLUE is a benchmark containing various natural language understanding tasks... (Wang et al., 2019).
Dataset Splits Yes We use both CIFAR-10 and CIFAR-100 in our experiments, which are colorful images with 50000 training samples and 10000 validation samples, but from 10 and 100 object classes, respectively.
Hardware Specification Yes The time is the training time measured on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions using Adam W optimizer and BERT model, but it does not specify version numbers for these or any other software components or libraries.
Experiment Setup Yes Setup. The model we used is Res Net (He et al., 2016) with different depths... We use the SGD with momentum optimizer to train each model for 200 epochs. The learning rate starts from 0.1 and decays by a factor of 0.2 at epochs 60, 120 and 160. The batch size is 128, and weight decay is 5e-4. For each xi, |B(xi)| = 10. The λP of the KL regularization coefficient is 1.0 for both MMEL-H and MMEL-S. The λT in equation (8) for MMEL-S is selected from {0.5, 1.0, 2.0}. Table 7: Hyperparameters of the BERTBASE model.