Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Authors: Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify µTransfer on Transformer and Res Net. For example, 1) by transferring pretraining HPs from a model of 13M parameters, we outperform published numbers of BERT-large (350M parameters)... 2) by transferring from 40M parameters, we outperform published numbers of the 6.7B GPT-3 model...
Researcher Affiliation Industry Microsoft Corporation Open AI
Pseudocode Yes Algorithm 1 Tuning a Large Target Model via µTransfer
Open Source Code Yes A Pytorch implementation of our technique can be found at github.com/microsoft/mup.2
Open Datasets Yes MLP width different hidden sizes trained for 20 epoch on CIFAR-10 using SGD.
Dataset Splits Yes We pick the HP combination that achieves the lowest validation loss for each trial.
Hardware Specification Yes All of our experiments are run on V100 GPUs.
Software Dependencies No We release a Py Torch [27] package for implementing µTransfer painlessly.
Experiment Setup Yes The models are trained on CIFAR-10 for 20 epochs, which is more than enough to ensure convergence.