Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime

Authors: Junfan Chen, Richong Zhang, Zheyan Luo, Chunming Hu, Yongyi Mao

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods.
Researcher Affiliation Academia 1SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 2Zhongguancun Laboratory, Beijing, China 3School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada
Pseudocode Yes Algorithm 1: Training the Classification Model with AWD
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets Yes SST-2 Stanford Sentiment Treebank (SST) (Socher et al. 2013) is a sentiment classification dataset. ... TREC (Li and Roth 2002) is a fine-grained question classification dataset. ... SNIPS (Coucke et al. 2018) is an English dataset for natural language understanding, which is widely used in intent classification.
Dataset Splits Yes We use the datasets provided by (Wu et al. 2022). To simulate low-resource text classification, we randomly select k = 10, 20, 50 examples for each class as the training sets. The training set withk =10, the validation set and the test set are the same in (Wu et al. 2022). The data statistics are shown in Table 1.
Hardware Specification Yes All experiments are conducted on an NVIDIA Tesla P100 GPU with 16GB memory.
Software Dependencies No The paper states 'We implement our AWD model using Pytorch deep learning framework. The BERT-uncased Base model is used as the text classifier.', but does not provide specific version numbers for PyTorch, BERT, or other software dependencies.
Experiment Setup Yes The dimension of the word embedding d is set to 768. The learning rate is set to 5e-4. We train AWD and each baseline model for 30 epochs. The hyper-parameter for training AWD(strict) is λ = 1 and ρ = 0.3, 0.5, 0.3 for respect k=10, 20, 50. When training AWD(strict), we perform 5 SGD updates in a dilution-network optimization step with a learning rate of 0.01. The hyper-parameter for training AWD(loose) is γ = 0.005.