reproducibilityindex.ai

Parameter Prediction for Unseen Deep Architectures

Authors: Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero Soriano

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a large-scale dataset of diverse computational graphs of neural architectures DEEPNETS-1M and use it to explore parameter prediction on CIFAR-10 and Image Net. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a Res Net-50 achieving a 60% accuracy on CIFAR-10.
Researcher Affiliation	Collaboration	Boris Knyazev1,2 Michal Drozdzal4, Graham W. Taylor1,2,3, Adriana Romero-Soriano4,5, 1 University of Guelph 2 Vector Institute for Artiﬁcial Intelligence 3 Canada CIFAR AI Chair 4 Facebook AI Research 5 Mc Gill University
Pseudocode	No	The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our DEEPNETS-1M dataset, trained GHNs and code is available at https://github.com/facebookresearch/ppuda.
Open Datasets	Yes	We use the DEEPNETS-1M dataset of architectures ( 3) as well as two image classiﬁcation datasets D1 (CIFAR-10 [15]) and D2 (Image Net [1]). CIFAR-10 consists of 50k training and 10k test images... Image Net is a larger scale dataset with 1.28M training and 50k test images... Our DEEPNETS-1M dataset, trained GHNs and code is available at https://github.com/facebookresearch/ppuda.
Dataset Splits	Yes	CIFAR-10 consists of 50k training and 10k test images of size 32 32 3 and 10 object categories. Image Net is a larger scale dataset with 1.28M training and 50k test images of variable size and 1000 ﬁne-grained object categories. We use 5k/50k training images as a validation set in CIFAR-10/Image Net and 500 validation architectures of DEEPNETS-1M for hyperparameter tuning. In-distribution (ID) architectures. We generate a training set of \|F\| = 106 architectures and validation/test sets of 500/500 architectures that follow the same generation rules and are considered to be ID samples.
Hardware Specification	Yes	To report speeds on Image Net in Table 4, we use a dedicated machine with a single NVIDIA V100-32GB and Intel Xeon CPU E5-1620 v4@ 3.50GHz.
Software Dependencies	No	The paper mentions optimizers like Adam but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries).
Experiment Setup	Yes	On CIFAR-10, we train evaluation architectures with SGD/Adam, initial learning rate η = 0.025 / η = 0.001, batch size b = 96 and up to 50 epochs. On Image Net, we train them with SGD, η = 0.1 and b = 128, and, for computational reasons (given 1402 evaluation architectures in total), we limit training with SGD to 1 epoch. We follow [24] and train GHNs with Adam, η = 0.001 and batch size of 64 images for CIFAR-10 and 256 for Image Net. We train for up to 300 epochs...