reproducibilityindex.ai

Multitask Prompted Training Enables Zero-Shot Task Generalization

Authors: Victor Sanh, Albert Webson, Colin Raffel, Stephen Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Teven Le Scao, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M Rush

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks. We ﬁne-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16 its size.
Researcher Affiliation	Collaboration	Victor Sanh Hugging Face Albert Webson Brown University Colin Raffel Hugging Face Stephen H. Bach Brown & Snorkel AI Lintang Sutawika Big Science Zaid Alyafeai KFUPM Antoine Chafﬁn IRISA & IMATAG Arnaud Stiegler Hyperscience Teven Le Scao Hugging Face Arun Raja I2R, Singapore Manan Dey SAP M Saiful Bari NTU, Singapore Canwen Xu UCSD & Hugging Face Urmish Thakker Samba Nova Systems Shanya Sharma Walmart Labs Eliza Szczechla Big Science Taewoon Kim VU Amsterdam Gunjan Chhablani Big Science Nihal V. Nayak Brown University Debajyoti Datta University of Virginia Jonathan Chang ASUS Mike Tian-Jian Jiang ZEALS, Japan Han Wang NYU Matteo Manica IBM Research Sheng Shen UC Berkeley Zheng-Xin Yong Brown University Harshit Pandey Big Science Michael Mc Kenna Parity Rachel Bawden Inria, France Thomas Wang Inria, France Trishala Neeraj Big Science Jos Rozen Naver Labs Europe Abheesht Sharma BITS Pilani, India Andrea Santilli University of Rome Thibault Fevry Big Science Jason Alan Fries Stanford & Snorkel AI Ryan Teehan Charles River Analytics Tali Bers Brown University Stella Biderman Booz Allen & Eleuther AI Leo Gao Eleuther AI Thomas Wolf Hugging Face Alexander M. Rush Hugging Face
Pseudocode	No	The paper describes its methods in narrative form and with figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	All trained models are available at https://github.com/bigscience-workshop/t-zero, and all prompts are available at https://github.com/bigscience-workshop/promptsource.
Open Datasets	Yes	All experiments use datasets in the Hugging Face datasets library (Lhoest et al., 2021).
Dataset Splits	Yes	We perform checkpoint selection by choosing the checkpoint that yields the highest score on the validation splits of our training datasets.
Hardware Specification	Yes	We are grateful for the TPU Research Cloud program which generously provided TPU credits to Hugging Face. Those credits were used to train all the models from this paper.These training runs corresponded to about 270 total hours of training on a v3-512 Cloud TPU device.
Software Dependencies	No	The paper mentions software components like T5, T5+LM, and Adafactor optimizer, but does not provide specific version numbers for these or other relevant software libraries/frameworks.
Experiment Setup	Yes	We truncate input and target sequences to 1024 and 256 tokens, respectively. [...] We use a batch size of 1024 sequences (corresponding to 220 total input tokens per batch) and the Adafactor optimizer (Shazeer and Stern, 2018). Following standard practice for ﬁne-tuning T5, we use a learning rate of 1e-3 and a dropout rate of 0.1.