Generalizing Few-Shot NAS with Gradient Matching
Authors: Shoukang Hu, Ruochen Wang, Lanqing HONG, Zhenguo Li, Cho-Jui Hsieh, Jiashi Feng
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluations of the proposed method on a wide range of search spaces (NASBench-201, DARTS, Mobile Net Space), datasets (cifar10, cifar100, Image Net) and search algorithms (DARTS, SNAS, RSPS, Proxyless NAS, OFA) demonstrate that it significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures. |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong 2University of California, Los Angeles 3Huawei Noah s Ark Lab 4National University of Singapore |
| Pseudocode | Yes | A PSEUDOCODE FOR GM-NAS |
| Open Source Code | Yes | Our code is available at https://github.com/skhu101/GM-NAS. |
| Open Datasets | Yes | We benchmark the proposed method on the full NASBench-201 Space (Dong & Yang, 2020) with five operations (none, skip, conv 1x1, conv 3x3, avgpool 3x3). |
| Dataset Splits | Yes | Table 8: Performance comparison among derived child networks using different supernet selection criteria in Few-Shot NAS and GM-NAS |
| Hardware Specification | No | The paper mentions 'GPU hours' for search cost but does not specify any particular GPU models, CPU models, or other hardware used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or library versions). |
| Experiment Setup | Yes | The derived architecture is trained from scratch with a batch size 96 for 600 epochs. We use SGD with an initial learning rate of 0.0025, a momentum of 0.9, and a weight decay of 3 10 4, and a cosine learning rate scheduler. In addition, we also deploy the cutout regularization with length 16, drop-path with probability 0.3, and an auxiliary tower of weight 0.4. |