Network Morphism

Authors: Tao Wei, Changhu Wang, Yong Rui, Chang Wen Chen

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme.
Researcher Affiliation Collaboration Microsoft Research, Beijing, China, 100080 Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, 14260
Pseudocode Yes Algorithm 1 General Network Morphism; Algorithm 2 Practical Network Morphism
Open Source Code No The paper does not provide any explicit statements about releasing open-source code or links to a code repository for the described methodology.
Open Datasets Yes The first experiment is conducted on the MNIST dataset (Le Cun et al., 1998). Extensive experiments were conducted on the CIFAR10 dataset (Krizhevsky & Hinton, 2009). We also conduct experiments on the Image Net dataset (Russakovsky et al., 2014).
Dataset Splits Yes MNIST is a standard dataset for handwritten digit recognition, with 60,000 training images and 10,000 testing images. CIFAR10 is an image recognition database composed of 32 32 color images. It contains 50,000 training images and 10,000 testing images. The models were trained on 1.28 million training images and tested on 50,000 validation images.
Hardware Specification No VGG16 was trained for around 2~3 months for a single GPU time (Simonyan & Zisserman, 2014). This only mentions "single GPU" without specifying a model or other hardware details.
Software Dependencies No The baseline network we adopted is the Caffe (Jia et al., 2014) cifar10_quick model. This mentions "Caffe" but does not provide a specific version number.
Experiment Setup Yes The sharp drop and increase in Fig. 8 are caused by the changes of learning rates. Since the parent network was learned with a much finer learning rate (1e-5) at the end of its training, we recovered it to a courser learning rate (1e-3) from the start, and hence there is an initial sharp drop. At 20k/30k iterations, the learning rate was reduced to 1e-4/1e-5, which caused the sharp increase.