KNAS: Green Neural Architecture Search
Authors: Jingjing Xu, Liang Zhao, Junyang Lin, Rundong Gao, Xu Sun, Hongxia Yang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that KNAS achieves competitive results with orders of magnitude faster than train-then-test paradigms on image classification tasks. Furthermore, the extremely low search cost enables its wide applications. The searched network also outperforms strong baseline Ro BERTA-large on two text classification tasks. |
| Researcher Affiliation | Collaboration | 1MOE Key Lab of Computational Linguistics, School of EECS, Peking University 2Center for Data Science, Peking University 3Alibaba Group. Correspondence to: Jingjing Xu <jingjingxu@pku.edu.cn>, Xu Sun <xusun@pku.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 KNAS Algorithm |
| Open Source Code | Yes | Codes are available at https: //github.com/Jingjing-NLP/KNAS. |
| Open Datasets | Yes | NAS-Bench-201 (Dong & Yang, 2020) is a benchmark dataset for NAS algorithms, constructed on image classification tasks, including CIFAR10, CIFAR100, and Image Net16-120 (Image Net-16). CIFAR10 and CIFAR100 are two widely used datasets3. CIFAR-10 consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. CIFAR100 has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. Image Net-16 is provided by Chrabaszcz et al. (2017). |
| Dataset Splits | No | The paper mentions a validation set in Algorithm 1 and states that NAS-Bench-201 architectures contain 'validation accuracy', but does not specify the explicit split proportions or sample counts for the validation sets for the datasets used (CIFAR10/100/ImageNet-16). |
| Hardware Specification | Yes | All baselines are implemented on a single NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions using Hugging Face for RoBERTa, but no specific software versions (e.g., Python version, library versions) are provided. |
| Experiment Setup | Yes | For all approaches, we set the time of architecture training plus evaluation to 2,160 seconds, 4,600 seconds, and 10,000 seconds on CIFAR10, CIFAR100, and Image Net-16, respectively. ... For MRPC, the batch size is set to 4, and the learning rate is set to 3e 5. For RTE, the batch size is set to 4, and the learning rate is set to 2e 5. For the rest hyper-parameters, we use the default settings. |