Few-shot Text Classification with Distributional Signatures
Authors: Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on five standard text classification datasets (Lang, 1995; Lewis et al., 2004; Lewis, 1997; He & Mc Auley, 2016; Misra, 2018) and one relation classification dataset (Han et al., 2018). Experimental results demonstrate that our model delivers significant performance gains over all baselines. For instance, our model outperforms prototypical networks by 20.6% on average in one-shot text classification and 17.3% in one-shot relation classification. |
| Researcher Affiliation | Collaboration | Computer Science and Artificial Intelligence Lab, MIT MIT-IBM Watson AI LAB, IBM Research {yujia,rmwu,regina}@csail.mit.edu, {shiyu.chang}@ibm.com |
| Pseudocode | Yes | Algorithm 1 contains the pseudo code for our learning procedure. |
| Open Source Code | Yes | 1Our code is available at https://github.com/Yujia Bao/Distributional-Signatures. |
| Open Datasets | Yes | 5All processed datasets along with their splits are publicly available. (See Appendix A.4 for more details.) |
| Dataset Splits | Yes | During meta-training, we sample 100 training episodes per epoch. We apply early stopping when the validation loss fails to improve for 20 epochs. We evaluate test performance based on 1000 testing episodes and report the average accuracy over 5 different random seeds. |
| Hardware Specification | Yes | 7less than 1 second on a single Ge Force GTX TITAN X |
| Software Dependencies | No | The paper mentions specific software components like "Adam" (optimizer) and "bi-directional LSTM", "BERT embeddings", "Hugging Face's codebase", but it does not provide specific version numbers for these libraries or frameworks (e.g., PyTorch version, HuggingFace Transformers version). |
| Experiment Setup | Yes | In the attention generator, we use a bi-directional LSTM with 50 hidden units and apply dropout of 0.1 on the output (Srivastava et al., 2014). In the ridge regressor, we optimize meta-parameters λ and a in the log space to maintain the positivity constraint. All parameters are optimized using Adam with a learning rate of 0.001 (Kingma & Ba, 2014). During meta-training, we sample 100 training episodes per epoch. We apply early stopping when the validation loss fails to improve for 20 epochs. |