Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners
Authors: Ningyu Zhang, Luoqiu Li, Xiang Chen, Shumin Deng, Zhen Bi, Chuanqi Tan, Fei Huang, Huajun Chen
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance. |
| Researcher Affiliation | Collaboration | 1College of Computer Science and Technology, Zhejiang University 2School of Software Technology, Zhejiang University 3Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies 4Hangzhou Innovation Center, Zhejiang University 5Alibaba Group |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available in https://github.com/zjunlp/DART. ... Our code is available in https://github.com/zjunlp/DART for reproducibility. |
| Open Datasets | Yes | We conduct a comprehensive study across 15 NLP tasks, which covers sentiment analysis, natural language inference, paraphrases, sentence similarity, relation extraction, and event extraction (We only report event argument extraction performance). The evaluation consisted of 10 popular sentence classification datasets (SST-2, MR, CR, Subj, TREC, MNLI, SNLI, QNLI, MRPC, QQP).To further evaluate the effectiveness of the proposed approach with complex label space, we conduct experiments on the relation extraction and event extraction datasets, including Sem Eval-2010 Task 8 (Hendrickx et al., 2010), TACRED-Revisit (Alt et al. (2020)), Wiki804 (Han et al., 2019), Chem Prot (Kringelum et al., 2016), and ACE-2005. |
| Dataset Splits | Yes | We utilize a grid search over multiple hyperparameters and select the best result as measured on Ddev for each set {D s train,Ddev},s Sseed. |
| Hardware Specification | Yes | We utilize Pytorch (Paszke et al., 2019) to conduct experiments with 1 Nvidia 3090 GPUs. |
| Software Dependencies | No | The paper mentions using 'Pytorch' but does not specify a version number for it. No other software libraries or tools are mentioned with specific version numbers. |
| Experiment Setup | Yes | The hyper-parameter search space is (the optimal set of parameters may vary across different tasks and data splits): learning rate [1e-5, 5e-5, 1e-4, 2e-4] weight decay [0.0, 0.01, 0.05, 0.10] number epochs [20,30] batch size: [4, 8, 16, 24, 32] max seq length: 128 gradient accumulation steps: [1, 2] |