DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

Authors: Weijun Hong, Guilin Li, Weinan Zhang, Ruiming Tang, Yunhe Wang, Zhenguo Li, Yong Yu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Drop NAS solves the above issues and achieves promising performance. Specifically, Drop NAS achieves 2.26% test error on CIFAR-10, 16.39% on CIFAR-100 and 23.4% on Image Net (with the same training hyperparameters as DARTS for a fair comparison). It is also observed that Drop NAS is robust across variants of the DARTS search space. Code is available at https://github.com/huawei-noah.
Researcher Affiliation Collaboration Weijun Hong1 , Guilin Li2 , Weinan Zhang1 , Ruiming Tang2 , Yunhe Wang2 , Zhenguo Li2 and Yong Yu1 1Shanghai Jiao Tong University, China 2Huawei Noah s Ark Lab, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Procedures are described in narrative text and mathematical formulations.
Open Source Code Yes Code is available at https://github.com/huawei-noah.
Open Datasets Yes To benchmark our grouped operation dropout algorithm, extensive experiments are carried out on CIFAR-10, CIFAR-100 and Image Net. Both the CIFAR-10 and CIFAR-100 datasets contain 50K training images and 10K testing images... Image Net is a much larger dataset consisting of 1.3M images for training and 50K images for testing, equally distributed among 1,000 classes.
Dataset Splits Yes Both the CIFAR-10 and CIFAR-100 datasets contain 50K training images and 10K testing images... Since we use one-level optimization, the training images do not need to be split for another validation set, so the architecture search is conducted on CIFAR-10/100 with all the training images on a single Nvidia Tesla V100.
Hardware Specification Yes the architecture search is conducted on CIFAR-10/100 with all the training images on a single Nvidia Tesla V100. ...The network is trained on a single Nvidia Tesla V100 for 600 epochs... The network is trained for 600 epochs with batch size 2048 on 8 Nvidia Tesla V100 GPUs
Software Dependencies No The paper mentions optimizers like SGD and Adam, but it does not specify any software environments or libraries with version numbers (e.g., Python version, PyTorch/TensorFlow versions, CUDA versions) needed for replication.
Experiment Setup Yes We use 14 cells stacked with 16 channels to form the one-shot model, train the supernet for 76 epochs with batch size 96... The model weights w are optimized by SGD with initial learning rate 0.0375, momentum 0.9, and weight decay 0.0003... The architecture parameters α are optimized by Adam, with initial learning rate 0.0003, momentum (0.5, 0.999) and weight decay 0.001. Drop path rate r is fixed to 3 10 5. ...20 cells are stacked to form the evaluation network with 36 initial channels. The network is trained...for 600 epochs with batch size 192. The network parameters are optimized by SGD with learning rate 0.05, momentum 0.9 and weight decay 0.0003...