MEDA: Meta-Learning with Data Augmentation for Few-Shot Text Classification

Authors: Pengfei Sun, Yawen Ouyang, Wenming Zhang, Xin-yu Dai

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results show that on both datasets, MEDA outperforms existing state-of-the-art methods and significantly improves the performance of meta-learning on few-shot text classification.
Researcher Affiliation Academia National Key Laboratory for Novel Software Technology, Nanjing University {spf, ouyangyw, zhangwm}@smail.nju.edu.cn, daixinyu@nju.edu.cn
Pseudocode Yes Algorithm 1 Training Strategy
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the proposed MEDA method is openly available. It only mentions using third-party libraries like BERT and Sentence-Transformers.
Open Datasets Yes To prove the effectiveness of our proposed model, we evaluate the MEDA on two publicly datasets for the few-shot scenario: SNIPS1 [Coucke et al., 2018] and ARSC2 [Yu et al., 2018]. 1https://github.com/snipsco/nlu-benchmark/ 2https://github.com/Gorov/Diverse Few Shot Amazon
Dataset Splits Yes SNIPS. As SNIPS is not a benchmark for few-shot learning, we first construct few-shot splits to simulate the fewshot scenario. We divide the original 7 intents into 5 intents as Ctrain and 2 intents (Add To Playlist, Rate Book) as Ctest. The Ctrain and Ctest are used as training set and test set respectively. Thus, we evaluate the performance on Ctest, i.e., 2-way-K-shot settings, where K=3,5,10. ARSC. For ARSC dataset, we partition datasets following [Yu et al., 2018]... we also select 12 (4 × 3) tasks from four domains (Books, DVD, Electronics, Kitchen) as the test set... All hyper-parameters of the MEDA are cross-validated on the validation set using a coarse grid search.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running experiments are mentioned in the paper.
Software Dependencies No Our implementation is based on Pytorch3. We experiment with pre-trained BERT [Devlin et al., 2019] using Sentence-Transformers codebase [Reimers and Gurevych, 2019]. Adam [Kingma and Ba, 2015] is used to train the MEDA in an end-to-end manner. (No version numbers provided for Pytorch or Sentence-Transformers).
Experiment Setup Yes The initial learning rate is 1e-3. In the loss function, we set λ = 1 and r = 1. To avoid overfitting, we use dropout with 0.2 dropout rate. We generate 10 samples using different augmentation methods for the given class.