Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime
Authors: Junfan Chen, Richong Zhang, Zheyan Luo, Chunming Hu, Yongyi Mao
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods. |
| Researcher Affiliation | Academia | 1SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 2Zhongguancun Laboratory, Beijing, China 3School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada |
| Pseudocode | Yes | Algorithm 1: Training the Classification Model with AWD |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | SST-2 Stanford Sentiment Treebank (SST) (Socher et al. 2013) is a sentiment classification dataset. ... TREC (Li and Roth 2002) is a fine-grained question classification dataset. ... SNIPS (Coucke et al. 2018) is an English dataset for natural language understanding, which is widely used in intent classification. |
| Dataset Splits | Yes | We use the datasets provided by (Wu et al. 2022). To simulate low-resource text classification, we randomly select k = 10, 20, 50 examples for each class as the training sets. The training set withk =10, the validation set and the test set are the same in (Wu et al. 2022). The data statistics are shown in Table 1. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA Tesla P100 GPU with 16GB memory. |
| Software Dependencies | No | The paper states 'We implement our AWD model using Pytorch deep learning framework. The BERT-uncased Base model is used as the text classifier.', but does not provide specific version numbers for PyTorch, BERT, or other software dependencies. |
| Experiment Setup | Yes | The dimension of the word embedding d is set to 768. The learning rate is set to 5e-4. We train AWD and each baseline model for 30 epochs. The hyper-parameter for training AWD(strict) is λ = 1 and ρ = 0.3, 0.5, 0.3 for respect k=10, 20, 50. When training AWD(strict), we perform 5 SGD updates in a dilution-network optimization step with a learning rate of 0.01. The hyper-parameter for training AWD(loose) is γ = 0.005. |