PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search
Authors: Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on Image Net (under the mobile setting) using 3.8 GPU-days for search. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University 2Huawei Noah s Ark Lab 3Tongji University 4Futurewei Technologies |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code has been made available at https://github.com/yuhuixu1993/PC-DARTS. |
| Open Datasets | Yes | We perform experiments on CIFAR10 and Image Net, two most popular datasets for evaluating neural architecture search. CIFAR10 (Krizhevsky & Hinton, 2009) consists of 60K images, all of which are of a spatial resolution of 32 32. Image Net (Deng et al., 2009) contains 1,000 object categories, and 1.3M training images and 50K validation images |
| Dataset Splits | Yes | The 50K training set of CIFAR10 is split into two subsets with equal size, with one subset used for training network weights and the other used for architecture hyper-parameters. ... We ran PC-DARTS 5 times and used standalone validation to pick the best from the 5 runs. This process was done by using 45K out of 50K training images for training, and the remaining 5K images for validation. ... To reduce search time, we randomly sample two subsets from the 1.3M training set of Image Net, with 10% and 2.5% images, respectively. The former one is used for training network weights and the latter for updating hyper-parameters. |
| Hardware Specification | Yes | The entire search process only requires 3 hours on a GTX 1080Ti GPU, or 1.5 hours on a Tesla V100 GPU ... We use eight Tesla V100 GPUs for search |
| Software Dependencies | No | The paper mentions optimizers like 'momentum SGD' and 'Adam optimizer' but does not specify software versions for any libraries, frameworks, or languages used (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | We train the network for 50 epochs, with the initial number of channels being 16. The 50K training set of CIFAR10 is split into two subsets with equal size, with one subset used for training network weights and the other used for architecture hyper-parameters. ... batch size during search is increased from 64 to 256. ... initial learning rate of 0.1 (annealed down to zero following a cosine schedule without restart), a momentum of 0.9, and a weight decay of 3 10 4. We use an Adam optimizer ... with a fixed learning rate of 6 10 4, a momentum of (0.5, 0.999) and a weight decay of 10 3. |