What Makes Better Augmentation Strategies? Augment Difficult but Not too Different
Authors: Jaehyung Kim, Dongyeop Kang, Sungsoo Ahn, Jinwoo Shin
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of the proposed augmentation policy learning scheme on various text classification datasets and GLUE benchmark (Wang et al., 2019) , where our method consistently improves over the recent state-of-the-art augmentation schemes by successfully discovering the effective augmentation methods for each task. |
| Researcher Affiliation | Academia | Jaehyung Kim1, Dongyeop Kang2, Sungsoo Ahn3, Jinwoo Shin1 1 Korea Advanced Institute of Science and Technology (KAIST) 2 University of Minnesota (UMN) 3 Pohang University of Science and Technology (POSTECH) |
| Pseudocode | Yes | Algorithm 1 Learning to augment difficult, but dot too different (DND) |
| Open Source Code | Yes | We also provide our code in the supplementary material. |
| Open Datasets | Yes | For the text classification task, we use the following benchmark datasets: (1) News20 (Lang, 1995), (2) Review50 (Chen & Liu, 2014), and (3) CLINC150 (Larson et al., 2019) for topic classification, (4) IMDB (Maas et al., 2011) and SST-5 (Socher et al., 2013) for sentiment classification, and (6) TREC (Li & Roth, 2002) for question type classification. |
| Dataset Splits | Yes | For datasets without given validation data, we use 10% of its training samples for the validation. |
| Hardware Specification | Yes | In our experiments, we use a single GPU (NVIDIA TITAN Xp) and 8 CPU cores (Intel Xeon E5-2630 v4). |
| Software Dependencies | No | All the used packages are along with the code. |
| Experiment Setup | Yes | All the experiments are conducted by fine-tuning Ro BERTa-base (Liu et al., 2019) using Adam optimizer (Kingma & Ba, 2015) with a fixed learning rate 1e-5 and the default hyperparameters of Adam. For the text classification tasks, the model is fine-tuned using the specified augmentation method with batch size 8 for 15 epochs. For GLUE benchmark task, we commonly use batch size 16, except RTE task with batch size 8 following (Aghajanyan et al., 2021). |