Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generalizing Few-Shot NAS with Gradient Matching
Authors: Shoukang Hu, Ruochen Wang, Lanqing HONG, Zhenguo Li, Cho-Jui Hsieh, Jiashi Feng
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluations of the proposed method on a wide range of search spaces (NASBench-201, DARTS, Mobile Net Space), datasets (cifar10, cifar100, Image Net) and search algorithms (DARTS, SNAS, RSPS, Proxyless NAS, OFA) demonstrate that it significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures. |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong 2University of California, Los Angeles 3Huawei Noah s Ark Lab 4National University of Singapore |
| Pseudocode | Yes | A PSEUDOCODE FOR GM-NAS |
| Open Source Code | Yes | Our code is available at https://github.com/skhu101/GM-NAS. |
| Open Datasets | Yes | We benchmark the proposed method on the full NASBench-201 Space (Dong & Yang, 2020) with five operations (none, skip, conv 1x1, conv 3x3, avgpool 3x3). |
| Dataset Splits | Yes | Table 8: Performance comparison among derived child networks using different supernet selection criteria in Few-Shot NAS and GM-NAS |
| Hardware Specification | No | The paper mentions 'GPU hours' for search cost but does not specify any particular GPU models, CPU models, or other hardware used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or library versions). |
| Experiment Setup | Yes | The derived architecture is trained from scratch with a batch size 96 for 600 epochs. We use SGD with an initial learning rate of 0.0025, a momentum of 0.9, and a weight decay of 3 10 4, and a cosine learning rate scheduler. In addition, we also deploy the cutout regularization with length 16, drop-path with probability 0.3, and an auxiliary tower of weight 0.4. |