Understanding and Exploring the Network with Stochastic Architectures

Authors: Zhijie Deng, Yinpeng Dong, Shifeng Zhang, Jun Zhu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first uncover the characteristics of NSA in various aspects ranging from training stability, convergence, predictive behaviour, to generalization capacity to unseen architectures. We identify various issues of the vanilla NSA, such as training/test disparity and function mode collapse, and further propose the solutions to these issues with theoretical and empirical insights. Remarkable performance (e.g., 2.75% error rate and 0.0032 expected calibration error on CIFAR-10) validate the effectiveness of such a model, providing new perspectives of exploring the potential of the network with stochastic architectures, beyond NAS.
Researcher Affiliation Academia Zhijie Deng, Yinpeng Dong, Shifeng Zhang, Jun Zhu Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, 100084 China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., a specific repository link or an explicit statement of code release) for the methodology described.
Open Datasets Yes By convention, on the CIFAR-10 [16] task, the deployed network is divided into 3 stages in different spatial sizes, each containing 8 convolution modules... We report the results in Table 2. In both tasks, NSA-id surpasses the strong baselines with clear margins. The 2.75% error rate on CIFAR-10 is rather promising... We implement a further baseline: Average of individuals, in which we individually trains 5 networks with the 5 architectures used by NSA-id, and report their average results... Finally, we use NSA-id to perform semi-supervised classification on CIFAR-10, using only 4000 labeled data.
Dataset Splits No The paper mentions "validation data" and "validation accuracy" (e.g., "accuracy on the validation data Dval") but does not explicitly provide specific details about the size or split percentage of the validation set for CIFAR-10 or CIFAR-100.
Hardware Specification Yes The training cost of NSA is almost identical to that of WRN-28-10 , taking about 0.6 GPU day on a GTX 2080Ti for 300 training epochs.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We use wide convolutions with the widening factor of 10 for feature extraction... We set the coefficient of consistency loss to be 20, following an anneal schedule [17]. We apply standard data processing and Cut Out augmentation [5]. The optimization settings follow WRN-28-10 [49].