reproducibilityindex.ai

Few-shot Text Classification with Distributional Signatures

Authors: Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on ﬁve standard text classiﬁcation datasets (Lang, 1995; Lewis et al., 2004; Lewis, 1997; He & Mc Auley, 2016; Misra, 2018) and one relation classiﬁcation dataset (Han et al., 2018). Experimental results demonstrate that our model delivers signiﬁcant performance gains over all baselines. For instance, our model outperforms prototypical networks by 20.6% on average in one-shot text classiﬁcation and 17.3% in one-shot relation classiﬁcation.
Researcher Affiliation	Collaboration	Computer Science and Artiﬁcial Intelligence Lab, MIT MIT-IBM Watson AI LAB, IBM Research {yujia,rmwu,regina}@csail.mit.edu, {shiyu.chang}@ibm.com
Pseudocode	Yes	Algorithm 1 contains the pseudo code for our learning procedure.
Open Source Code	Yes	1Our code is available at https://github.com/Yujia Bao/Distributional-Signatures.
Open Datasets	Yes	5All processed datasets along with their splits are publicly available. (See Appendix A.4 for more details.)
Dataset Splits	Yes	During meta-training, we sample 100 training episodes per epoch. We apply early stopping when the validation loss fails to improve for 20 epochs. We evaluate test performance based on 1000 testing episodes and report the average accuracy over 5 different random seeds.
Hardware Specification	Yes	7less than 1 second on a single Ge Force GTX TITAN X
Software Dependencies	No	The paper mentions specific software components like "Adam" (optimizer) and "bi-directional LSTM", "BERT embeddings", "Hugging Face's codebase", but it does not provide specific version numbers for these libraries or frameworks (e.g., PyTorch version, HuggingFace Transformers version).
Experiment Setup	Yes	In the attention generator, we use a bi-directional LSTM with 50 hidden units and apply dropout of 0.1 on the output (Srivastava et al., 2014). In the ridge regressor, we optimize meta-parameters λ and a in the log space to maintain the positivity constraint. All parameters are optimized using Adam with a learning rate of 0.001 (Kingma & Ba, 2014). During meta-training, we sample 100 training episodes per epoch. We apply early stopping when the validation loss fails to improve for 20 epochs.