reproducibilityindex.ai

Path-Level Network Transformation for Efficient Architecture Search

Authors: Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, Yong Yu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimented on the image classiﬁcation datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efﬁciency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on Image Net in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures.
Researcher Affiliation	Academia	1Shanghai Jiao Tong University, Shanghai, China 2Massachusetts Institute of Technology, Cambridge, USA. Correspondence to: Han Cai <hcai@apex.sjtu.edu.cn>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Experiment code: https://github.com/han-cai/Path-Level-EAS
Open Datasets	Yes	CIFAR-10 (Krizhevsky & Hinton, 2009) for the image classiﬁcation task and transfer the learned cell structures to Image Net dataset (Deng et al., 2009).
Dataset Splits	Yes	CIFAR-10 contains 50,000 training images and 10,000 test images, where we randomly sample 5,000 images from the training set to form a validation set for the architecture search process, similar to previous work (Zoph et al., 2017; Cai et al., 2018).
Hardware Specification	No	The paper mentions 'about 200 GPU-hours' for its experiments, but does not specify the exact GPU models, CPU models, or other detailed hardware specifications used.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For the meta-controller, described in Section 3.3, the hidden state size of all LSTM units is 100 and we train it with the ADAM optimizer (Kingma & Ba, 2014) using the REINFORCE algorithm (Williams, 1992). To reduce variance, we adopt a baseline function which is an exponential moving average of previous rewards with a decay of 0.95, as done in Cai et al. (2018). We also use an entropy penalty with a weight of 0.01 to ensure exploration. ... The obtained network... is then trained for 20 epochs on CIFAR-10 with an initial learning rate of 0.035 that is further annealed with a cosine learning rate decay (Loshchilov & Hutter, 2016), a batch size of 64, a weight decay of 0.0001, using the SGD optimizer with a Nesterov momentum of 0.9. ...Additionally, we update the meta-controller with mini-batches of 10 architectures. ... In this stage, we train networks for 300 epochs with an initial learning rate of 0.1, while all other settings keep the same. ... We set the maximum depth of the cell structures to be 3... For nodes whose merge scheme is add, the number of branches is chosen from {2, 3} while for nodes whose merge scheme is concatenation, the number of branches is set to be 2.