Understanding and Exploring the Network with Stochastic Architectures
Authors: Zhijie Deng, Yinpeng Dong, Shifeng Zhang, Jun Zhu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first uncover the characteristics of NSA in various aspects ranging from training stability, convergence, predictive behaviour, to generalization capacity to unseen architectures. We identify various issues of the vanilla NSA, such as training/test disparity and function mode collapse, and further propose the solutions to these issues with theoretical and empirical insights. Remarkable performance (e.g., 2.75% error rate and 0.0032 expected calibration error on CIFAR-10) validate the effectiveness of such a model, providing new perspectives of exploring the potential of the network with stochastic architectures, beyond NAS. |
| Researcher Affiliation | Academia | Zhijie Deng, Yinpeng Dong, Shifeng Zhang, Jun Zhu Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, 100084 China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a specific repository link or an explicit statement of code release) for the methodology described. |
| Open Datasets | Yes | By convention, on the CIFAR-10 [16] task, the deployed network is divided into 3 stages in different spatial sizes, each containing 8 convolution modules... We report the results in Table 2. In both tasks, NSA-id surpasses the strong baselines with clear margins. The 2.75% error rate on CIFAR-10 is rather promising... We implement a further baseline: Average of individuals, in which we individually trains 5 networks with the 5 architectures used by NSA-id, and report their average results... Finally, we use NSA-id to perform semi-supervised classification on CIFAR-10, using only 4000 labeled data. |
| Dataset Splits | No | The paper mentions "validation data" and "validation accuracy" (e.g., "accuracy on the validation data Dval") but does not explicitly provide specific details about the size or split percentage of the validation set for CIFAR-10 or CIFAR-100. |
| Hardware Specification | Yes | The training cost of NSA is almost identical to that of WRN-28-10 , taking about 0.6 GPU day on a GTX 2080Ti for 300 training epochs. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We use wide convolutions with the widening factor of 10 for feature extraction... We set the coefficient of consistency loss to be 20, following an anneal schedule [17]. We apply standard data processing and Cut Out augmentation [5]. The optimization settings follow WRN-28-10 [49]. |