reproducibilityindex.ai

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Authors: Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters.
Researcher Affiliation	Collaboration	1The Ohio State University, 2MIT-IBM Watson AI Lab, 3Massachusetts Institute of Technology {wang.9215,sun.397}@osu.edu, {rpanda, leonidka, rsferis}@ibm.com, yoonkim@mit.edu
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any code-like formatted procedures.
Open Source Code	No	The paper includes a link to a 'Project page: https://zhenwang9102.github.io/mpt.html' in a footnote, which is a general project overview page rather than a direct link to a specific source-code repository for the methodology.
Open Datasets	Yes	As in Asai et al. (2022) we evaluate MPT using 6 datasets with more than 100k annotations as source tasks: MNLI (Williams et al., 2017), QNLI (Demszky et al., 2018), QQP (Wang et al., 2018), SST-2 (Socher et al., 2013), SQu AD (Rajpurkar et al., 2016), and Re Co RD (Zhang et al., 2018). We use 23 datasets from four benchmarks as target tasks: Multi RC (Khashabi et al., 2018), Bool Q (Clark et al., 2019a), Wi C (Pilehvar & Camacho-Collados, 2018), WSC (Levesque et al., 2012), and CB (De Marneffe et al., 2019) from Super GLUE (Wang et al., 2019); RTE (Giampiccolo et al., 2007), Co LA (Warstadt et al., 2019), STS-B (Cer et al., 2017), MRPC (Dolan & Brockett, 2005), MNLI, QQP, QNLI and SST-2 from GLUE (Wang et al., 2018); Natural Questions (Kwiatkowski et al., 2019), Hotpot QA (Yang et al., 2018), News QA (Trischler et al., 2017) and Search QA (Dunn et al., 2017) from MRQA (Fisch et al., 2019); Wino Grande (Sakaguchi et al., 2021), Yelp-2 (Zhang et al., 2015), Sci Tail (Khot et al., 2018) and PAWS-Wiki (Zhang et al., 2019) from the Others benchmark in (Asai et al., 2022); and E2E (Novikova et al., 2017) and Web NLG (Gardent et al., 2017) for experiments on adapting to natural language generation tasks.
Dataset Splits	Yes	For all datasets, we use the development set as the testing set if the original testing set is not publicly available. If the training set is small, we split the original development set into the development and testing set; otherwise, we separate a development set from the training set and use the original development set for testing.
Hardware Specification	No	The paper mentions receiving support for 'computational resources on the Ai MOS Supercomputer' but does not specify any particular hardware components such as specific GPU or CPU models, or memory details used for the experiments.
Software Dependencies	No	The paper mentions using the T5 model but does not specify any software dependencies like programming languages, libraries (e.g., PyTorch, TensorFlow), or their corresponding version numbers.
Experiment Setup	Yes	For source training, we train MPT on the mixture of source tasks for 5 epochs... We train 20 epochs on small datasets, 10 epochs on large (more than 10k examples) datasets, and 5 epochs on the MRQA datasets. During source training, we set the default learning rate as 0.3... For target adaptation, we set the learning rate to 0.3 and 0.4 for the task-shared and task-specific components... We set the default number of tunable tokens per each prompt to 100... We set the default batch size for T5-Base as 32 and for model scaling experiments, the batch sizes for T5-Small and T5-Large are 100, and 12 respectively. The default input length for most tasks are set to 256, except Multi RC and MRQA benchmarks have input length of 348 and 512. We set the distillation loss coefficient λ in Equation 5 to 0.9 and keep it fixed for all our experiments.