Graph HyperNetworks for Neural Architecture Search
Authors: Chris Zhang, Mengye Ren, Raquel Urtasun
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we use our proposed GHN to search for the best CNN architecture for image classification. First, we evaluate the GHN on the standard CIFAR (Krizhevsky & Hinton, 2009) and Image Net (Russakovsky et al., 2015) architecture search benchmarks. Next, we apply GHN on an anytime prediction task where we optimize the speed-accuracy tradeoff that is key for many real-time applications. Finally, we benchmark the GHN s predicted-performance correlation and explore various factors in an ablation study. |
| Researcher Affiliation | Collaboration | Chris Zhang1,2, Mengye Ren1,3 & Raquel Urtasun1,3 1Uber Advanced Technologies Group, 2University of Waterloo, 3University of Toronto |
| Pseudocode | No | The paper describes algorithms and mathematical formulations but does not include a dedicated pseudocode block or a clearly labeled algorithm. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | We conduct our initial set of experiments on CIFAR-10 (Krizhevsky & Hinton, 2009), which contains 10 object classes and 50,000 training images and 10,000 test images of size 32 32 3. We also run our GHN algorithm on the Image Net dataset (Russakovsky et al., 2015), which contains 1.28 million training images. |
| Dataset Splits | Yes | We conduct our initial set of experiments on CIFAR-10 (Krizhevsky & Hinton, 2009), which contains 10 object classes and 50,000 training images and 10,000 test images of size 32 32 3. We use 5,000 images split from the training set as our validation set. |
| Hardware Specification | No | The paper mentions 'distributed training across 32 GPUs' but does not specify the exact models of these GPUs (e.g., NVIDIA A100, Tesla V100) or other specific hardware components. |
| Software Dependencies | No | The paper mentions using 'ADAM optimizer' and 'GRU cell' but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | For the GNN module, we use a standard GRU cell (Cho et al., 2014) with hidden size 32 and 2 layer MLP with hidden size 32 as the recurrent cell function U and message function M respectively. The shared hypernetwork H ( ; ϕ) is a 2-layer MLP with hidden size 64. From the results of ablations studies in Section 5.4, the GHN is trained with blocks with N = 7 nodes and T = 5 propagations under the forward-backward scheme, using the ADAM optimizer (Kingma & Ba, 2015). Training details of the final selected architectures are chosen to follow existing works and can be found in the Appendix. [...] the final candidates are trained for 600 epochs using SGD with momentum 0.9, a single period cosine schedule with lmax = 0.025, and batch size 64. |