Generalizing Few-Shot NAS with Gradient Matching

Authors: Shoukang Hu, Ruochen Wang, Lanqing HONG, Zhenguo Li, Cho-Jui Hsieh, Jiashi Feng

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical evaluations of the proposed method on a wide range of search spaces (NASBench-201, DARTS, Mobile Net Space), datasets (cifar10, cifar100, Image Net) and search algorithms (DARTS, SNAS, RSPS, Proxyless NAS, OFA) demonstrate that it significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
Researcher Affiliation Collaboration 1The Chinese University of Hong Kong 2University of California, Los Angeles 3Huawei Noah s Ark Lab 4National University of Singapore
Pseudocode Yes A PSEUDOCODE FOR GM-NAS
Open Source Code Yes Our code is available at https://github.com/skhu101/GM-NAS.
Open Datasets Yes We benchmark the proposed method on the full NASBench-201 Space (Dong & Yang, 2020) with five operations (none, skip, conv 1x1, conv 3x3, avgpool 3x3).
Dataset Splits Yes Table 8: Performance comparison among derived child networks using different supernet selection criteria in Few-Shot NAS and GM-NAS
Hardware Specification No The paper mentions 'GPU hours' for search cost but does not specify any particular GPU models, CPU models, or other hardware used for experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or library versions).
Experiment Setup Yes The derived architecture is trained from scratch with a batch size 96 for 600 epochs. We use SGD with an initial learning rate of 0.0025, a momentum of 0.9, and a weight decay of 3 10 4, and a cosine learning rate scheduler. In addition, we also deploy the cutout regularization with length 16, drop-path with probability 0.3, and an auxiliary tower of weight 0.4.