MEDA: Meta-Learning with Data Augmentation for Few-Shot Text Classification
Authors: Pengfei Sun, Yawen Ouyang, Wenming Zhang, Xin-yu Dai
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results show that on both datasets, MEDA outperforms existing state-of-the-art methods and significantly improves the performance of meta-learning on few-shot text classification. |
| Researcher Affiliation | Academia | National Key Laboratory for Novel Software Technology, Nanjing University {spf, ouyangyw, zhangwm}@smail.nju.edu.cn, daixinyu@nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training Strategy |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the proposed MEDA method is openly available. It only mentions using third-party libraries like BERT and Sentence-Transformers. |
| Open Datasets | Yes | To prove the effectiveness of our proposed model, we evaluate the MEDA on two publicly datasets for the few-shot scenario: SNIPS1 [Coucke et al., 2018] and ARSC2 [Yu et al., 2018]. 1https://github.com/snipsco/nlu-benchmark/ 2https://github.com/Gorov/Diverse Few Shot Amazon |
| Dataset Splits | Yes | SNIPS. As SNIPS is not a benchmark for few-shot learning, we first construct few-shot splits to simulate the fewshot scenario. We divide the original 7 intents into 5 intents as Ctrain and 2 intents (Add To Playlist, Rate Book) as Ctest. The Ctrain and Ctest are used as training set and test set respectively. Thus, we evaluate the performance on Ctest, i.e., 2-way-K-shot settings, where K=3,5,10. ARSC. For ARSC dataset, we partition datasets following [Yu et al., 2018]... we also select 12 (4 × 3) tasks from four domains (Books, DVD, Electronics, Kitchen) as the test set... All hyper-parameters of the MEDA are cross-validated on the validation set using a coarse grid search. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | Our implementation is based on Pytorch3. We experiment with pre-trained BERT [Devlin et al., 2019] using Sentence-Transformers codebase [Reimers and Gurevych, 2019]. Adam [Kingma and Ba, 2015] is used to train the MEDA in an end-to-end manner. (No version numbers provided for Pytorch or Sentence-Transformers). |
| Experiment Setup | Yes | The initial learning rate is 1e-3. In the loss function, we set λ = 1 and r = 1. To avoid overfitting, we use dropout with 0.2 dropout rate. We generate 10 samples using different augmentation methods for the given class. |