reproducibilityindex.ai

HyperPrompt: Prompt-based Task-Conditioning of Transformers

Authors: Yun He, Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive empirical experiments, we demonstrate that Hyper Prompt can achieve superior performances over strong T5 multi-task learning baselines and parameter-efﬁcient adapter variants including Prompt-Tuning and Hyper Former++ on Natural Language Understanding benchmarks of GLUE and Super GLUE across many model sizes.
Researcher Affiliation	Collaboration	Yun He * 1 Huaixiu Steven Zheng * 2 Yi Tay 2 Jai Gupta 2 Yu Du 2 Vamsi Aribandi 2 Zhe Zhao 2 Ya Guang Li 2 Zhao Chen 3 Donald Metzler 2 Heng-Tze Cheng 2 Ed H. Chi 2 *Equal contribution 1Texas A&M University, work done as an intern at Google 2Google Research 3Waymo LLC.
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	The paper mentions using 'Mesh Tensorﬂow2 (Shazeer et al., 2018)' and 'T5 library3 (Raffel et al., 2019)' with footnotes linking to their respective GitHub repositories. However, it does not provide a direct statement or link for the open-source code of the specific 'Hyper Prompt' methodology described in this paper.
Open Datasets	Yes	Datasets. We evaluate the performance of the models on GLUE (Wang et al., 2018) and Super GLUE (Wang et al., 2019) respectively.
Dataset Splits	Yes	We save a checkpoint every 2000 steps for all models and follow the same convention as Raffel et al. (2019) in selecting the best checkpoint for each task. ...To calculate the attention mass over hyper-prompts per layer, we averaged the hyper-prompt attention softmax scores across 100 validation examples and each attention head in a layer...
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experimental setup.
Software Dependencies	No	The paper mentions 'Mesh Tensorﬂow2 (Shazeer et al., 2018)' and 'T5 library3 (Raffel et al., 2019)', and 'Adam optimizer (Kingma & Ba, 2014)', but no specific version numbers for these software components are provided in the text.
Experiment Setup	Yes	For all experiments, we train models 300K steps with a batch size of 128 and each batch is a mixture which samples each task proportionately to the number of examples in the dataset. Learning rate is a constant of 1e-3 with Adam optimizer (Kingma & Ba, 2014). For hyper-parameters tuning, the length of prompt l is selected from {12, 16, 20, 20, 24} at the encoder and {2, 4, 6, 8, 10, 12, 14, 16} at the decoder. The bottleneck dimension b in the transform matrices is set to d/r, where d is the model dimension of the T5 models and r is a reduction factor and selected from {16, 32, 64}. The dimension t of the layer-aware task embedding is selected from {32, 64, 128}.