Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Authors: Ningyu Zhang, Luoqiu Li, Xiang Chen, Shumin Deng, Zhen Bi, Chuanqi Tan, Fei Huang, Huajun Chen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.
Researcher Affiliation Collaboration 1College of Computer Science and Technology, Zhejiang University 2School of Software Technology, Zhejiang University 3Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies 4Hangzhou Innovation Center, Zhejiang University 5Alibaba Group
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available in https://github.com/zjunlp/DART. ... Our code is available in https://github.com/zjunlp/DART for reproducibility.
Open Datasets Yes We conduct a comprehensive study across 15 NLP tasks, which covers sentiment analysis, natural language inference, paraphrases, sentence similarity, relation extraction, and event extraction (We only report event argument extraction performance). The evaluation consisted of 10 popular sentence classification datasets (SST-2, MR, CR, Subj, TREC, MNLI, SNLI, QNLI, MRPC, QQP).To further evaluate the effectiveness of the proposed approach with complex label space, we conduct experiments on the relation extraction and event extraction datasets, including Sem Eval-2010 Task 8 (Hendrickx et al., 2010), TACRED-Revisit (Alt et al. (2020)), Wiki804 (Han et al., 2019), Chem Prot (Kringelum et al., 2016), and ACE-2005.
Dataset Splits Yes We utilize a grid search over multiple hyperparameters and select the best result as measured on Ddev for each set {D s train,Ddev},s Sseed.
Hardware Specification Yes We utilize Pytorch (Paszke et al., 2019) to conduct experiments with 1 Nvidia 3090 GPUs.
Software Dependencies No The paper mentions using 'Pytorch' but does not specify a version number for it. No other software libraries or tools are mentioned with specific version numbers.
Experiment Setup Yes The hyper-parameter search space is (the optimal set of parameters may vary across different tasks and data splits): learning rate [1e-5, 5e-5, 1e-4, 2e-4] weight decay [0.0, 0.01, 0.05, 0.10] number epochs [20,30] batch size: [4, 8, 16, 24, 32] max seq length: 128 gradient accumulation steps: [1, 2]