Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks

Authors: Bo Li, Wei Ye, Quansen Wang, Wen Zhao, Shikun Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method extensively on 8 NLU tasks with 14 datasets. The experimental results show that Mask Matching significantly outperforms its counterparts of fine-tuning and conventional prompt-tuning, setting up state-of-the-art performances in several datasets.
Researcher Affiliation Academia 1National Engineering Research Center for Software Engineering, Peking University 2School of Software and Microelectronics, Peking University 3Boston University
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We use 8 different natural language understanding tasks across 14 datasets to verify the effectiveness of our proposed method. The metric and class numbers for each dataset are shown in Table 1. ... Some of them are selected from GLUE benchmark (Wang et al. 2019b) and Super GLUE (Wang et al. 2019a), others are popular in various specific research fields, such as entity typing, relation classification, and stance detection.
Dataset Splits Yes For a fair comparison, all the datasets and the data split are the same as in previous works. ... We randomly select 10% percent of the whole training set, and keep the development and test sets unchanged. ... We report the performance on the development sets of Wi C, QNLI, SNLI, and QQP, as previous works (Liu et al. 2019; He, Gao, and Chen 2021; Wang et al. 2021; Bajaj et al. 2022) only presented the single model s performances on the development set.
Hardware Specification Yes We use Pytorch (Paszke et al. 2019) and Tesla T4 GPU in our experiments.
Software Dependencies No The paper mentions "Pytorch (Paszke et al. 2019)" and specific pre-trained models (RoBERTa-large, BERT-large) from HuggingFace, but does not provide specific version numbers for general software dependencies like PyTorch itself.
Experiment Setup Yes Specifically, we implement a batch size of 8, with a gradient accumulation of 4, and employ the Adam W optimizer (Loshchilov and Hutter 2019), with a learning rate of 1e-5 and a warm-up ratio of 0.2 (Goyal et al. 2017) for all datasets. ... The training epoch is set to 20, and the maximum input length is limited to 500.