reproducibilityindex.ai

Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks

Authors: Bo Li, Wei Ye, Quansen Wang, Wen Zhao, Shikun Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method extensively on 8 NLU tasks with 14 datasets. The experimental results show that Mask Matching significantly outperforms its counterparts of fine-tuning and conventional prompt-tuning, setting up state-of-the-art performances in several datasets.
Researcher Affiliation	Academia	1National Engineering Research Center for Software Engineering, Peking University 2School of Software and Microelectronics, Peking University 3Boston University
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We use 8 different natural language understanding tasks across 14 datasets to verify the effectiveness of our proposed method. The metric and class numbers for each dataset are shown in Table 1. ... Some of them are selected from GLUE benchmark (Wang et al. 2019b) and Super GLUE (Wang et al. 2019a), others are popular in various specific research fields, such as entity typing, relation classification, and stance detection.
Dataset Splits	Yes	For a fair comparison, all the datasets and the data split are the same as in previous works. ... We randomly select 10% percent of the whole training set, and keep the development and test sets unchanged. ... We report the performance on the development sets of Wi C, QNLI, SNLI, and QQP, as previous works (Liu et al. 2019; He, Gao, and Chen 2021; Wang et al. 2021; Bajaj et al. 2022) only presented the single model s performances on the development set.
Hardware Specification	Yes	We use Pytorch (Paszke et al. 2019) and Tesla T4 GPU in our experiments.
Software Dependencies	No	The paper mentions "Pytorch (Paszke et al. 2019)" and specific pre-trained models (RoBERTa-large, BERT-large) from HuggingFace, but does not provide specific version numbers for general software dependencies like PyTorch itself.
Experiment Setup	Yes	Specifically, we implement a batch size of 8, with a gradient accumulation of 4, and employ the Adam W optimizer (Loshchilov and Hutter 2019), with a learning rate of 1e-5 and a warm-up ratio of 0.2 (Goyal et al. 2017) for all datasets. ... The training epoch is set to 20, and the maximum input length is limited to 500.