AGNAS: Attention-Guided Micro and Macro-Architecture Search
Authors: Zihao Sun, Yu Hu, Shun Lu, Longxing Yang, Jilin Mei, Yinhe Han, Xiaowei Li
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that AGNAS can achieve 2.46% test error on CIFAR-10 in the DARTS search space, and 23.4% test error when directly searching on Image Net in the Proxyless NAS search space. AGNAS also achieves optimal performance on NAS-Bench-201, outperforming state-of-the-art approaches. |
| Researcher Affiliation | Academia | 1Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 3School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 AGNAS Search Algorithm |
| Open Source Code | Yes | The source code can be available at https://github.com/Sunzh1996/AGNAS. |
| Open Datasets | Yes | We conduct the micro search and macro search on two popular image classification datasets including CIFAR-10 and Image Net. CIFAR-10 datasets have 50K training RGB images and 10K testing RGB images with a fixed spatial resolution of 32 32. ... The ILSVRC2012 Image Net dataset contains 1.28M training and 50K validation images with 1000 object categories. |
| Dataset Splits | Yes | In the search phase, we split the training datasets in half to train the super-network weights and the attention module weights, and the other half as validation datasets to select the optimal operation in micro search by forwarding propagation of all images to get the corresponding attention weights. ...and 2.5% training datasets as validation datasets are used to select the final choice block at each layer based on the attention weights. |
| Hardware Specification | Yes | The micro search process elapses nine hours on 1080Ti GPU with only 10 GB GPU memory. The super-network with the added attention module is trained on 8 NVIDIA V100 GPUs on 10% of the training datasets for 50 epochs with a batch size of 64 per GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for key software components or libraries used in the experiments. |
| Experiment Setup | Yes | In particular, we use the SGD optimizer to update network weights with initial learning rate of 0.025, momentum 0.9, and weight decay 3 10 4. We search for 50 epochs with the batch size of 64. We use an SGD optimizer with a weight decay of 3 10 4 and a momentum of 0.9. The initial learning rate starts from 0.025 and follows the cosine annealing strategy to a minimum of 0. The network is trained from scratch for 600 epochs with batch size of 96. We use the SGD optimizer with an initial learning rate of 0.5, weight decay of 4 10 5, and momentum of 0.9. The network is trained for 240 epochs with the batch size of 1024 on 8 NVIDIA V100 GPUs. |