Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Authors: Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify µTransfer on Transformer and Res Net. For example, 1) by transferring pretraining HPs from a model of 13M parameters, we outperform published numbers of BERT-large (350M parameters)... 2) by transferring from 40M parameters, we outperform published numbers of the 6.7B GPT-3 model... |
| Researcher Affiliation | Industry | Microsoft Corporation Open AI |
| Pseudocode | Yes | Algorithm 1 Tuning a Large Target Model via µTransfer |
| Open Source Code | Yes | A Pytorch implementation of our technique can be found at github.com/microsoft/mup.2 |
| Open Datasets | Yes | MLP width different hidden sizes trained for 20 epoch on CIFAR-10 using SGD. |
| Dataset Splits | Yes | We pick the HP combination that achieves the lowest validation loss for each trial. |
| Hardware Specification | Yes | All of our experiments are run on V100 GPUs. |
| Software Dependencies | No | We release a Py Torch [27] package for implementing µTransfer painlessly. |
| Experiment Setup | Yes | The models are trained on CIFAR-10 for 20 epochs, which is more than enough to ensure convergence. |