PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

Authors: Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on Image Net (under the mobile setting) using 3.8 GPU-days for search.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University 2Huawei Noah s Ark Lab 3Tongji University 4Futurewei Technologies
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code has been made available at https://github.com/yuhuixu1993/PC-DARTS.
Open Datasets Yes We perform experiments on CIFAR10 and Image Net, two most popular datasets for evaluating neural architecture search. CIFAR10 (Krizhevsky & Hinton, 2009) consists of 60K images, all of which are of a spatial resolution of 32 32. Image Net (Deng et al., 2009) contains 1,000 object categories, and 1.3M training images and 50K validation images
Dataset Splits Yes The 50K training set of CIFAR10 is split into two subsets with equal size, with one subset used for training network weights and the other used for architecture hyper-parameters. ... We ran PC-DARTS 5 times and used standalone validation to pick the best from the 5 runs. This process was done by using 45K out of 50K training images for training, and the remaining 5K images for validation. ... To reduce search time, we randomly sample two subsets from the 1.3M training set of Image Net, with 10% and 2.5% images, respectively. The former one is used for training network weights and the latter for updating hyper-parameters.
Hardware Specification Yes The entire search process only requires 3 hours on a GTX 1080Ti GPU, or 1.5 hours on a Tesla V100 GPU ... We use eight Tesla V100 GPUs for search
Software Dependencies No The paper mentions optimizers like 'momentum SGD' and 'Adam optimizer' but does not specify software versions for any libraries, frameworks, or languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes We train the network for 50 epochs, with the initial number of channels being 16. The 50K training set of CIFAR10 is split into two subsets with equal size, with one subset used for training network weights and the other used for architecture hyper-parameters. ... batch size during search is increased from 64 to 256. ... initial learning rate of 0.1 (annealed down to zero following a cosine schedule without restart), a momentum of 0.9, and a weight decay of 3 10 4. We use an Adam optimizer ... with a fixed learning rate of 6 10 4, a momentum of (0.5, 0.999) and a weight decay of 10 3.