Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate if neural networks initialized with the parameters wpred predicted by GHNs obtain high performance without any training (Eq. 2) and after fine-tuning (Eq. 3). We focus on a large-scale Image Net setting, but also evaluate in a transfer learning setting from Image Net to few-shot image classification and object detection tasks. |
| Researcher Affiliation | Collaboration | 1Samsung SAIT AI Lab, Montreal 2Samsung Advanced Institute of Technology (SAIT), South Korea 3Mila, Universit e de Montreal 4Canada CIFAR AI Chair. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We significantly scale up our Transformer-based GHN (GHN-3) and we release our largest model achieving the best results at https://github.com/ Samsung SAILMontreal/ghn3 (Fig. 1); |
| Open Datasets | Yes | We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) with 1.28M training and 50K validation images of the 1k classes. |
| Dataset Splits | Yes | We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) with 1.28M training and 50K validation images of the 1k classes. |
| Hardware Specification | Yes | Train time is for GHNs with m = 8 and is measured on 4x NVIDIA-A100 GPUs. |
| Software Dependencies | No | The paper mentions 'automatic mixed precision in Py Torch' but does not specify the version number of PyTorch or other software dependencies. |
| Experiment Setup | Yes | We train the GHNs on the ILSVRC-2012 Image Net dataset (Russakovsky et al., 2015) ... for 75 epochs using Adam W (Loshchilov & Hutter, 2017), initial learning rate 4e-4 decayed using the cosine scheduling, weight decay λ=1e-2, predicted parameter regularization γ=3e-5 (Eq. 10), batch size b=128 and automatic mixed precision in Py Torch (Paszke et al., 2019). |