Few-shot Text Classification with Distributional Signatures

Authors: Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on five standard text classification datasets (Lang, 1995; Lewis et al., 2004; Lewis, 1997; He & Mc Auley, 2016; Misra, 2018) and one relation classification dataset (Han et al., 2018). Experimental results demonstrate that our model delivers significant performance gains over all baselines. For instance, our model outperforms prototypical networks by 20.6% on average in one-shot text classification and 17.3% in one-shot relation classification.
Researcher Affiliation Collaboration Computer Science and Artificial Intelligence Lab, MIT MIT-IBM Watson AI LAB, IBM Research {yujia,rmwu,regina}@csail.mit.edu, {shiyu.chang}@ibm.com
Pseudocode Yes Algorithm 1 contains the pseudo code for our learning procedure.
Open Source Code Yes 1Our code is available at https://github.com/Yujia Bao/Distributional-Signatures.
Open Datasets Yes 5All processed datasets along with their splits are publicly available. (See Appendix A.4 for more details.)
Dataset Splits Yes During meta-training, we sample 100 training episodes per epoch. We apply early stopping when the validation loss fails to improve for 20 epochs. We evaluate test performance based on 1000 testing episodes and report the average accuracy over 5 different random seeds.
Hardware Specification Yes 7less than 1 second on a single Ge Force GTX TITAN X
Software Dependencies No The paper mentions specific software components like "Adam" (optimizer) and "bi-directional LSTM", "BERT embeddings", "Hugging Face's codebase", but it does not provide specific version numbers for these libraries or frameworks (e.g., PyTorch version, HuggingFace Transformers version).
Experiment Setup Yes In the attention generator, we use a bi-directional LSTM with 50 hidden units and apply dropout of 0.1 on the output (Srivastava et al., 2014). In the ridge regressor, we optimize meta-parameters λ and a in the log space to maintain the positivity constraint. All parameters are optimized using Adam with a learning rate of 0.001 (Kingma & Ba, 2014). During meta-training, we sample 100 training episodes per epoch. We apply early stopping when the validation loss fails to improve for 20 epochs.