reproducibilityindex.ai

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

Authors: Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the effectiveness of the proposed method. Speciﬁcally, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on Image Net (under the mobile setting) using 3.8 GPU-days for search.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University 2Huawei Noah s Ark Lab 3Tongji University 4Futurewei Technologies
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code has been made available at https://github.com/yuhuixu1993/PC-DARTS.
Open Datasets	Yes	We perform experiments on CIFAR10 and Image Net, two most popular datasets for evaluating neural architecture search. CIFAR10 (Krizhevsky & Hinton, 2009) consists of 60K images, all of which are of a spatial resolution of 32 32. Image Net (Deng et al., 2009) contains 1,000 object categories, and 1.3M training images and 50K validation images
Dataset Splits	Yes	The 50K training set of CIFAR10 is split into two subsets with equal size, with one subset used for training network weights and the other used for architecture hyper-parameters. ... We ran PC-DARTS 5 times and used standalone validation to pick the best from the 5 runs. This process was done by using 45K out of 50K training images for training, and the remaining 5K images for validation. ... To reduce search time, we randomly sample two subsets from the 1.3M training set of Image Net, with 10% and 2.5% images, respectively. The former one is used for training network weights and the latter for updating hyper-parameters.
Hardware Specification	Yes	The entire search process only requires 3 hours on a GTX 1080Ti GPU, or 1.5 hours on a Tesla V100 GPU ... We use eight Tesla V100 GPUs for search
Software Dependencies	No	The paper mentions optimizers like 'momentum SGD' and 'Adam optimizer' but does not specify software versions for any libraries, frameworks, or languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We train the network for 50 epochs, with the initial number of channels being 16. The 50K training set of CIFAR10 is split into two subsets with equal size, with one subset used for training network weights and the other used for architecture hyper-parameters. ... batch size during search is increased from 64 to 256. ... initial learning rate of 0.1 (annealed down to zero following a cosine schedule without restart), a momentum of 0.9, and a weight decay of 3 10 4. We use an Adam optimizer ... with a ﬁxed learning rate of 6 10 4, a momentum of (0.5, 0.999) and a weight decay of 10 3.