Parameter Prediction for Unseen Deep Architectures
Authors: Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero Soriano
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a large-scale dataset of diverse computational graphs of neural architectures DEEPNETS-1M and use it to explore parameter prediction on CIFAR-10 and Image Net. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a Res Net-50 achieving a 60% accuracy on CIFAR-10. |
| Researcher Affiliation | Collaboration | Boris Knyazev1,2 Michal Drozdzal4, Graham W. Taylor1,2,3, Adriana Romero-Soriano4,5, 1 University of Guelph 2 Vector Institute for Artificial Intelligence 3 Canada CIFAR AI Chair 4 Facebook AI Research 5 Mc Gill University |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our DEEPNETS-1M dataset, trained GHNs and code is available at https://github.com/facebookresearch/ppuda. |
| Open Datasets | Yes | We use the DEEPNETS-1M dataset of architectures ( 3) as well as two image classification datasets D1 (CIFAR-10 [15]) and D2 (Image Net [1]). CIFAR-10 consists of 50k training and 10k test images... Image Net is a larger scale dataset with 1.28M training and 50k test images... Our DEEPNETS-1M dataset, trained GHNs and code is available at https://github.com/facebookresearch/ppuda. |
| Dataset Splits | Yes | CIFAR-10 consists of 50k training and 10k test images of size 32 32 3 and 10 object categories. Image Net is a larger scale dataset with 1.28M training and 50k test images of variable size and 1000 fine-grained object categories. We use 5k/50k training images as a validation set in CIFAR-10/Image Net and 500 validation architectures of DEEPNETS-1M for hyperparameter tuning. In-distribution (ID) architectures. We generate a training set of |F| = 106 architectures and validation/test sets of 500/500 architectures that follow the same generation rules and are considered to be ID samples. |
| Hardware Specification | Yes | To report speeds on Image Net in Table 4, we use a dedicated machine with a single NVIDIA V100-32GB and Intel Xeon CPU E5-1620 v4@ 3.50GHz. |
| Software Dependencies | No | The paper mentions optimizers like Adam but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries). |
| Experiment Setup | Yes | On CIFAR-10, we train evaluation architectures with SGD/Adam, initial learning rate η = 0.025 / η = 0.001, batch size b = 96 and up to 50 epochs. On Image Net, we train them with SGD, η = 0.1 and b = 128, and, for computational reasons (given 1402 evaluation architectures in total), we limit training with SGD to 1 epoch. We follow [24] and train GHNs with Adam, η = 0.001 and batch size of 64 images for CIFAR-10 and 256 for Image Net. We train for up to 300 epochs... |