Parameter Prediction for Unseen Deep Architectures

Authors: Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero Soriano

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce a large-scale dataset of diverse computational graphs of neural architectures DEEPNETS-1M and use it to explore parameter prediction on CIFAR-10 and Image Net. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a Res Net-50 achieving a 60% accuracy on CIFAR-10.
Researcher Affiliation Collaboration Boris Knyazev1,2 Michal Drozdzal4, Graham W. Taylor1,2,3, Adriana Romero-Soriano4,5, 1 University of Guelph 2 Vector Institute for Artificial Intelligence 3 Canada CIFAR AI Chair 4 Facebook AI Research 5 Mc Gill University
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our DEEPNETS-1M dataset, trained GHNs and code is available at https://github.com/facebookresearch/ppuda.
Open Datasets Yes We use the DEEPNETS-1M dataset of architectures ( 3) as well as two image classification datasets D1 (CIFAR-10 [15]) and D2 (Image Net [1]). CIFAR-10 consists of 50k training and 10k test images... Image Net is a larger scale dataset with 1.28M training and 50k test images... Our DEEPNETS-1M dataset, trained GHNs and code is available at https://github.com/facebookresearch/ppuda.
Dataset Splits Yes CIFAR-10 consists of 50k training and 10k test images of size 32 32 3 and 10 object categories. Image Net is a larger scale dataset with 1.28M training and 50k test images of variable size and 1000 fine-grained object categories. We use 5k/50k training images as a validation set in CIFAR-10/Image Net and 500 validation architectures of DEEPNETS-1M for hyperparameter tuning. In-distribution (ID) architectures. We generate a training set of |F| = 106 architectures and validation/test sets of 500/500 architectures that follow the same generation rules and are considered to be ID samples.
Hardware Specification Yes To report speeds on Image Net in Table 4, we use a dedicated machine with a single NVIDIA V100-32GB and Intel Xeon CPU E5-1620 v4@ 3.50GHz.
Software Dependencies No The paper mentions optimizers like Adam but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries).
Experiment Setup Yes On CIFAR-10, we train evaluation architectures with SGD/Adam, initial learning rate η = 0.025 / η = 0.001, batch size b = 96 and up to 50 epochs. On Image Net, we train them with SGD, η = 0.1 and b = 128, and, for computational reasons (given 1402 evaluation architectures in total), we limit training with SGD to 1 epoch. We follow [24] and train GHNs with Adam, η = 0.001 and batch size of 64 images for CIFAR-10 and 256 for Image Net. We train for up to 300 epochs...