Graph HyperNetworks for Neural Architecture Search

Authors: Chris Zhang, Mengye Ren, Raquel Urtasun

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we use our proposed GHN to search for the best CNN architecture for image classification. First, we evaluate the GHN on the standard CIFAR (Krizhevsky & Hinton, 2009) and Image Net (Russakovsky et al., 2015) architecture search benchmarks. Next, we apply GHN on an anytime prediction task where we optimize the speed-accuracy tradeoff that is key for many real-time applications. Finally, we benchmark the GHN s predicted-performance correlation and explore various factors in an ablation study.
Researcher Affiliation Collaboration Chris Zhang1,2, Mengye Ren1,3 & Raquel Urtasun1,3 1Uber Advanced Technologies Group, 2University of Waterloo, 3University of Toronto
Pseudocode No The paper describes algorithms and mathematical formulations but does not include a dedicated pseudocode block or a clearly labeled algorithm.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the described methodology.
Open Datasets Yes We conduct our initial set of experiments on CIFAR-10 (Krizhevsky & Hinton, 2009), which contains 10 object classes and 50,000 training images and 10,000 test images of size 32 32 3. We also run our GHN algorithm on the Image Net dataset (Russakovsky et al., 2015), which contains 1.28 million training images.
Dataset Splits Yes We conduct our initial set of experiments on CIFAR-10 (Krizhevsky & Hinton, 2009), which contains 10 object classes and 50,000 training images and 10,000 test images of size 32 32 3. We use 5,000 images split from the training set as our validation set.
Hardware Specification No The paper mentions 'distributed training across 32 GPUs' but does not specify the exact models of these GPUs (e.g., NVIDIA A100, Tesla V100) or other specific hardware components.
Software Dependencies No The paper mentions using 'ADAM optimizer' and 'GRU cell' but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes For the GNN module, we use a standard GRU cell (Cho et al., 2014) with hidden size 32 and 2 layer MLP with hidden size 32 as the recurrent cell function U and message function M respectively. The shared hypernetwork H ( ; ϕ) is a 2-layer MLP with hidden size 64. From the results of ablations studies in Section 5.4, the GHN is trained with blocks with N = 7 nodes and T = 5 propagations under the forward-backward scheme, using the ADAM optimizer (Kingma & Ba, 2015). Training details of the final selected architectures are chosen to follow existing works and can be found in the Appendix. [...] the final candidates are trained for 600 epochs using SGD with momentum 0.9, a single period cosine schedule with lmax = 0.025, and batch size 64.