Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Authors: Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify µTransfer on Transformer and Res Net. For example, 1) by transferring pretraining HPs from a model of 13M parameters, we outperform published numbers of BERT-large (350M parameters)... 2) by transferring from 40M parameters, we outperform published numbers of the 6.7B GPT-3 model... |
| Researcher Affiliation | Industry | Microsoft Corporation Open AI |
| Pseudocode | Yes | Algorithm 1 Tuning a Large Target Model via µTransfer |
| Open Source Code | Yes | A Pytorch implementation of our technique can be found at github.com/microsoft/mup.2 |
| Open Datasets | Yes | MLP width different hidden sizes trained for 20 epoch on CIFAR-10 using SGD. |
| Dataset Splits | Yes | We pick the HP combination that achieves the lowest validation loss for each trial. |
| Hardware Specification | Yes | All of our experiments are run on V100 GPUs. |
| Software Dependencies | No | We release a Py Torch [27] package for implementing µTransfer painlessly. |
| Experiment Setup | Yes | The models are trained on CIFAR-10 for 20 epochs, which is more than enough to ensure convergence. |