reproducibilityindex.ai

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

Authors: Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Via extensive experiments, we show that EXT5 outperforms strong T5 baselines on Super GLUE, GEM, Rainbow, Closed-Book QA tasks, and several tasks outside of EXMIX. EXT5 also signiﬁcantly improves sample efﬁciency while pre-training.
Researcher Affiliation	Industry	Vamsi Aribandi , Yi Tay , Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder , Donald Metzler Google Research, Deep Mind {aribandi, yitay}@google.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All of the modeling and training code used for EXT5 and its variants is already open-sourced as a part of the Mesh Tensorﬂow1 (Shazeer et al., 2018) and T52 (Raffel et al., 2020) Libraries.
Open Datasets	Yes	Additionally, EXMIX is composed of datasets that are already publicly available. (referring to Table 11 and its subsequent text which lists datasets with citations like Wang et al. (2019b) and See et al. (2017)).
Dataset Splits	Yes	We report test set results on all datasets except Common Gen and To TTo, on which we report validation scores. ... For each dataset, we select the best model checkpoint using average of BLEU, ROUGE-1, ROUGE-2 and ROUGE-L scores on the validation set.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU model, CPU type, TPU version) used for running the experiments. It mentions training for '1M total steps with a batch size of 2048' but no hardware specifics.
Software Dependencies	No	Our models were trained using Mesh Tensorﬂow (Shazeer et al., 2018) using the T5 library (Raffel et al., 2020). While software names are mentioned, specific version numbers for these libraries are not provided.
Experiment Setup	Yes	We pre-train our models for 1M total steps with a batch size of 2048 and sequence length 512... For optimization, we use Adafactor with an inverse square root learning rate schedule that kicks in after a a constant phase of 0.01 for 10k steps. ...Fine-tuning... EXT5 generally beneﬁtted from a smaller learning rate while ﬁne-tuning (10 4 worked well for EXT5 vs 10 3 for T5 variants).