Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate if neural networks initialized with the parameters wpred predicted by GHNs obtain high performance without any training (Eq. 2) and after fine-tuning (Eq. 3). We focus on a large-scale Image Net setting, but also evaluate in a transfer learning setting from Image Net to few-shot image classification and object detection tasks.
Researcher Affiliation Collaboration 1Samsung SAIT AI Lab, Montreal 2Samsung Advanced Institute of Technology (SAIT), South Korea 3Mila, Universit e de Montreal 4Canada CIFAR AI Chair.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We significantly scale up our Transformer-based GHN (GHN-3) and we release our largest model achieving the best results at https://github.com/ Samsung SAILMontreal/ghn3 (Fig. 1);
Open Datasets Yes We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) with 1.28M training and 50K validation images of the 1k classes.
Dataset Splits Yes We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) with 1.28M training and 50K validation images of the 1k classes.
Hardware Specification Yes Train time is for GHNs with m = 8 and is measured on 4x NVIDIA-A100 GPUs.
Software Dependencies No The paper mentions 'automatic mixed precision in Py Torch' but does not specify the version number of PyTorch or other software dependencies.
Experiment Setup Yes We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) ... for 75 epochs using Adam W (Loshchilov & Hutter, 2017), initial learning rate 4e-4 decayed using the cosine scheduling, weight decay λ=1e-2, predicted parameter regularization γ=3e-5 (Eq. 10), batch size b=128 and automatic mixed precision in Py Torch (Paszke et al., 2019).