reproducibilityindex.ai

HyperTuning: Toward Adapting Large Language Models without Back-propagation

Authors: Jason Phang, Yi Mao, Pengcheng He, Weizhu Chen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Hyper T5 on P3, Meta ICL and Super-Natural Instructions datasets, and show that it can effectively generate parameters for unseen tasks.
Researcher Affiliation	Collaboration	1Center for Data Science, New York University, NY, USA 2Eleuther AI 3Microsoft Azure AI, WA, USA. Correspondence to: Jason phang <jasonphang@nyu.edu>.
Pseudocode	Yes	Additional architectural details and pseudo-code for both Hyper T5-Prefix and Hyper T5-Lo RA models can be found in Appendix C.
Open Source Code	No	The paper does not provide an explicit statement or link to its own open-source code for the described methodology.
Open Datasets	Yes	To demonstrate the generality of our approach, we conduct experiments on three different multi-task training datasets, each with different held-out tasks and evaluation protocols. Public Pool of Prompts (P3) (Sanh et al., 2022) [...] Meta ICL (Min et al., 2022) [...] Super-Natural Instructions (S-NI) (Wang et al., 2022)
Dataset Splits	No	The paper mentions 'held-out tasks' for evaluation and 'dev' in table headers, which typically imply a validation/development set, but it does not explicitly provide specific percentages or counts for training/validation/test splits, nor does it define the 'dev' set's size or role explicitly as a validation set for hyperparameter tuning separate from the test set.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cloud instances).
Software Dependencies	No	The paper mentions software like '1-bit Adam', 'ZeRO', and 'Transformers' but does not provide specific version numbers for these components.
Experiment Setup	Yes	All experiments are trained with 1-bit Adam (Dettmers et al., 2022) and batch size of 256, a learning rate of 5e-5, and a linear decay schedule. Training was performed with Ze RO (Rajbhandari et al., 2020) and Transformers (Wolf et al., 2020). For hypermodels, the hypermodel s max input sequence length is 1024 tokens and the downstream model s max input sequence length is 384 tokens. [...] The max target sequence length is set to 128 for all experiments.