What Makes Better Augmentation Strategies? Augment Difficult but Not too Different

Authors: Jaehyung Kim, Dongyeop Kang, Sungsoo Ahn, Jinwoo Shin

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of the proposed augmentation policy learning scheme on various text classification datasets and GLUE benchmark (Wang et al., 2019) , where our method consistently improves over the recent state-of-the-art augmentation schemes by successfully discovering the effective augmentation methods for each task.
Researcher Affiliation Academia Jaehyung Kim1, Dongyeop Kang2, Sungsoo Ahn3, Jinwoo Shin1 1 Korea Advanced Institute of Science and Technology (KAIST) 2 University of Minnesota (UMN) 3 Pohang University of Science and Technology (POSTECH)
Pseudocode Yes Algorithm 1 Learning to augment difficult, but dot too different (DND)
Open Source Code Yes We also provide our code in the supplementary material.
Open Datasets Yes For the text classification task, we use the following benchmark datasets: (1) News20 (Lang, 1995), (2) Review50 (Chen & Liu, 2014), and (3) CLINC150 (Larson et al., 2019) for topic classification, (4) IMDB (Maas et al., 2011) and SST-5 (Socher et al., 2013) for sentiment classification, and (6) TREC (Li & Roth, 2002) for question type classification.
Dataset Splits Yes For datasets without given validation data, we use 10% of its training samples for the validation.
Hardware Specification Yes In our experiments, we use a single GPU (NVIDIA TITAN Xp) and 8 CPU cores (Intel Xeon E5-2630 v4).
Software Dependencies No All the used packages are along with the code.
Experiment Setup Yes All the experiments are conducted by fine-tuning Ro BERTa-base (Liu et al., 2019) using Adam optimizer (Kingma & Ba, 2015) with a fixed learning rate 1e-5 and the default hyperparameters of Adam. For the text classification tasks, the model is fine-tuned using the specified augmentation method with batch size 8 for 15 epochs. For GLUE benchmark task, we commonly use batch size 16, except RTE task with batch size 8 following (Aghajanyan et al., 2021).