reproducibilityindex.ai

What Makes Better Augmentation Strategies? Augment Difficult but Not too Different

Authors: Jaehyung Kim, Dongyeop Kang, Sungsoo Ahn, Jinwoo Shin

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of the proposed augmentation policy learning scheme on various text classiﬁcation datasets and GLUE benchmark (Wang et al., 2019) , where our method consistently improves over the recent state-of-the-art augmentation schemes by successfully discovering the effective augmentation methods for each task.
Researcher Affiliation	Academia	Jaehyung Kim1, Dongyeop Kang2, Sungsoo Ahn3, Jinwoo Shin1 1 Korea Advanced Institute of Science and Technology (KAIST) 2 University of Minnesota (UMN) 3 Pohang University of Science and Technology (POSTECH)
Pseudocode	Yes	Algorithm 1 Learning to augment difﬁcult, but dot too different (DND)
Open Source Code	Yes	We also provide our code in the supplementary material.
Open Datasets	Yes	For the text classiﬁcation task, we use the following benchmark datasets: (1) News20 (Lang, 1995), (2) Review50 (Chen & Liu, 2014), and (3) CLINC150 (Larson et al., 2019) for topic classiﬁcation, (4) IMDB (Maas et al., 2011) and SST-5 (Socher et al., 2013) for sentiment classiﬁcation, and (6) TREC (Li & Roth, 2002) for question type classiﬁcation.
Dataset Splits	Yes	For datasets without given validation data, we use 10% of its training samples for the validation.
Hardware Specification	Yes	In our experiments, we use a single GPU (NVIDIA TITAN Xp) and 8 CPU cores (Intel Xeon E5-2630 v4).
Software Dependencies	No	All the used packages are along with the code.
Experiment Setup	Yes	All the experiments are conducted by ﬁne-tuning Ro BERTa-base (Liu et al., 2019) using Adam optimizer (Kingma & Ba, 2015) with a ﬁxed learning rate 1e-5 and the default hyperparameters of Adam. For the text classiﬁcation tasks, the model is ﬁne-tuned using the speciﬁed augmentation method with batch size 8 for 15 epochs. For GLUE benchmark task, we commonly use batch size 16, except RTE task with batch size 8 following (Aghajanyan et al., 2021).