Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Authors: Rang Meng, Weijie Chen, Di Xie, Yuan Zhang, Shiliang Pu5158-5165

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments carried out on CIFAR-100 illustrate the efficiency of our proposed method. Our search results are strongly consistent with the optimal ones directly selected from the architecture dataset. To further confirm the generalization of our proposed method, we also conduct experiments on Tiny Image Net and Image Net.
Researcher Affiliation Collaboration Rang Meng,1 Weijie Chen,2 Di Xie,2 Yuan Zhang,2 Shiliang Pu2 1College of Control Science and Engineering, Zhejiang University 2Hikvision Research Institute r meng@zju.edu.cn, {chenweijie5, xiedi, zhangyuan, pushiliang}@hikvision.com
Pseudocode Yes Algorithm 1: Layer Assignment Search Algorithm
Open Source Code No Bringing this question, we build a neural architecture dataset of different layer assignments, which consists of 908 different neural networks trained on CIFAR-100, including plain networks and residual networks(we will release later).
Open Datasets Yes Benchmark Datasets CIFAR-100(Krizhevsky, Hinton, and others 2009) is a dataset for 100-classes image classification. ... Tiny-Image Net is a subset of Image Net for 200-classes image classification. ... Image Net(Russakovsky et al. 2015) is a 1000-classes image classification dataset...
Dataset Splits Yes There are 500 training images and 100 testing images per class with resolution 32 32. ... There are 500 training images, 50 validation images and 50 testing images per class with resolution 64 64.
Hardware Specification Yes We totally use 7 GPUs for training. ... TITAN XP
Software Dependencies No Not found. The paper mentions "Pytorch" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes During training phase, we first zero-pad the images with 4 pixels on each side and then randomly crop them to produce 32 32 images, followed by randomly horizontal flipping. We normalize them by channel means subtraction and standard deviations division for both training dataset and validation dataset. During building an architecture dataset of layer assignment, we train all the enumerated networks in Pytorch. using SGD with Nesterov momentum 0.9. The base learning rate is set to 0.1 and multiplied with a factor 0.2 at 60 epochs, 120 epochs and 160 epochs, respectively. Weight decay is set as 0.0005. All the networks are trained with batch size 128 for 200 epochs.