InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

Authors: Junda Wu, Tong Yu, Rui Wang, Zhao Song, Ruiyi Zhang, Handong Zhao, Chaochao Lu, Shuai Li, Ricardo Henao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate that Info Prompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods. We conduct experiments with datasets of sequence classification from the GLUE benchmark [95], along with those of relation extraction tasks and NER tasks.
Researcher Affiliation Collaboration Junda Wu1 Tong Yu2 Rui Wang3 Zhao Song2 Ruiyi Zhang2 Handong Zhao2 Chaochao Lu4 Shuai Li5 Ricardo Henao3,6 1University of California, San Diego 2Adobe Research 3Duke University 4University of Cambridge 5Shanghai Jiao Tong University 6KAUST
Pseudocode Yes Algorithm 1 Our Algorithm.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We conduct experiments with datasets of sequence classification from the GLUE benchmark [95], along with those of relation extraction tasks and NER tasks. We choose four sequence classification tasks from the GLUE benchmark: RTE (Recognizing Textual Entailment, [7]), MRPC (Microsoft Research Paraphrase Corpus, [28]), Co LA (Corpus of Linguistic Acceptability, [98]) and SST-2 (Sentence Sentiment Treebank, [81]).... We follow the same data splitting strategy for ACE2005 corpus as the previous work [103, 71]. For the Semeval-2010 tasks, we follow the official data partition [44].
Dataset Splits Yes We follow the resource constrained scenario in [40] that trains each task with only 64 or 256 samples. We follow the same data splitting strategy for ACE2005 corpus as the previous work [103, 71]. For the Semeval-2010 tasks, we follow the official data partition [44].
Hardware Specification No The paper does not provide specific details on the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup Yes We experiment with np = 1 and np = 4 prompt tokens for each task. The prompt tokens are inserted into the template for each task. Similar to [40], we adopt the Ro BERTa-large model as our pretrained encoder. We freeze the pretrained parameters and only train the parameters of the prompt head and prompt tokens. During training, we empirically set β = 0.1 and γ = 0.05. The number of negative samples is K = 32. The learning rate is 1e 3 and the batch size is 8. For each task, we report the results after 30 epochs, averaged over 5 random seeds.