Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks
Authors: Bo Li, Wei Ye, Quansen Wang, Wen Zhao, Shikun Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method extensively on 8 NLU tasks with 14 datasets. The experimental results show that Mask Matching significantly outperforms its counterparts of fine-tuning and conventional prompt-tuning, setting up state-of-the-art performances in several datasets. |
| Researcher Affiliation | Academia | 1National Engineering Research Center for Software Engineering, Peking University 2School of Software and Microelectronics, Peking University 3Boston University |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We use 8 different natural language understanding tasks across 14 datasets to verify the effectiveness of our proposed method. The metric and class numbers for each dataset are shown in Table 1. ... Some of them are selected from GLUE benchmark (Wang et al. 2019b) and Super GLUE (Wang et al. 2019a), others are popular in various specific research fields, such as entity typing, relation classification, and stance detection. |
| Dataset Splits | Yes | For a fair comparison, all the datasets and the data split are the same as in previous works. ... We randomly select 10% percent of the whole training set, and keep the development and test sets unchanged. ... We report the performance on the development sets of Wi C, QNLI, SNLI, and QQP, as previous works (Liu et al. 2019; He, Gao, and Chen 2021; Wang et al. 2021; Bajaj et al. 2022) only presented the single model s performances on the development set. |
| Hardware Specification | Yes | We use Pytorch (Paszke et al. 2019) and Tesla T4 GPU in our experiments. |
| Software Dependencies | No | The paper mentions "Pytorch (Paszke et al. 2019)" and specific pre-trained models (RoBERTa-large, BERT-large) from HuggingFace, but does not provide specific version numbers for general software dependencies like PyTorch itself. |
| Experiment Setup | Yes | Specifically, we implement a batch size of 8, with a gradient accumulation of 4, and employ the Adam W optimizer (Loshchilov and Hutter 2019), with a learning rate of 1e-5 and a warm-up ratio of 0.2 (Goyal et al. 2017) for all datasets. ... The training epoch is set to 20, and the maximum input length is limited to 500. |