reproducibilityindex.ai

SI-VDNAS: Semi-Implicit Variational Dropout for Hierarchical One-shot Neural Architecture Search

Authors: Yaoming Wang, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that SI-VDNAS finds a convergent architecture with only 2.7 MB parameters within 0.8 GPU-days and can achieve 2.60% top-1 error rate on CIFAR-10.4 Experiments
Researcher Affiliation	Academia	1Department of Electronic Engineering, Shanghai Jiao Tong University, China 2Department of Computer Science & Engineering, Shanghai Jiao Tong University, China {wang yaoming, daiwenrui, lcl1985, zoujunni, xionghongkai}@sjtu.edu.cn
Pseudocode	Yes	Algorithm 1 Semi-implicit Variational Dropout NAS
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	4.1 Datasets CIFAR-10/100 [Krizhevsky and Hinton, 2009] is a popular dataset consisting of 60K images, 50K training images and 10K test images. ... Image Net [Deng et al., 2009] is a large-scale benchmark for image classiﬁcation.
Dataset Splits	Yes	For the training, we split the training images into two subsets with the same size. One subset is used for training network parameters, the other is used for architectural parameters.CIFAR-10/100 [Krizhevsky and Hinton, 2009] is a popular dataset consisting of 60K images, 50K training images and 10K test images.
Hardware Specification	Yes	The search process requires 8 GPU-hours for optimal structure within 50 epochs and 20 GPU-hours for a convergent result within 150 epochs on a single NVIDIA GTX 1080Ti GPU. The search time can be reduced by about 50% on a single Tesla V100 GPU.
Software Dependencies	No	The paper mentions optimizers (e.g., SGD) and techniques (e.g., Cutout, drop-path trick) but does not specify any software libraries or frameworks with version numbers (e.g., TensorFlow, PyTorch, scikit-learn versions) required to replicate the experiments.
Experiment Setup	Yes	The initial number of channels is set to 36. The network weights are trained from scratch using all the 50K training images with a batch size of 96. The network is trained for 600 epochs. We use the SGD optimizer with an initial learning rate of 0.025 (annealed down to zero following a cosine schedule without restart), a momentum of 0.9, a weight decay of 3 10 4/5 10 4 and a norm gradient clipping at 5. We apply the drop-path trick with the probability of 0.3. Cutout is also used in our evaluation.