reproducibilityindex.ai

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate if neural networks initialized with the parameters wpred predicted by GHNs obtain high performance without any training (Eq. 2) and after fine-tuning (Eq. 3). We focus on a large-scale Image Net setting, but also evaluate in a transfer learning setting from Image Net to few-shot image classification and object detection tasks.
Researcher Affiliation	Collaboration	1Samsung SAIT AI Lab, Montreal 2Samsung Advanced Institute of Technology (SAIT), South Korea 3Mila, Universit e de Montreal 4Canada CIFAR AI Chair.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We significantly scale up our Transformer-based GHN (GHN-3) and we release our largest model achieving the best results at https://github.com/ Samsung SAILMontreal/ghn3 (Fig. 1);
Open Datasets	Yes	We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) with 1.28M training and 50K validation images of the 1k classes.
Dataset Splits	Yes	We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) with 1.28M training and 50K validation images of the 1k classes.
Hardware Specification	Yes	Train time is for GHNs with m = 8 and is measured on 4x NVIDIA-A100 GPUs.
Software Dependencies	No	The paper mentions 'automatic mixed precision in Py Torch' but does not specify the version number of PyTorch or other software dependencies.
Experiment Setup	Yes	We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) ... for 75 epochs using Adam W (Loshchilov & Hutter, 2017), initial learning rate 4e-4 decayed using the cosine scheduling, weight decay λ=1e-2, predicted parameter regularization γ=3e-5 (Eq. 10), batch size b=128 and automatic mixed precision in Py Torch (Paszke et al., 2019).