Revisiting Parameter Sharing for Automatic Neural Channel Number Search

Authors: Jiaxing Wang, Haoli Bai, Jiaxiang Wu, Xupeng Shi, Junzhou Huang, Irwin King, Michael Lyu, Jian Cheng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to study the effects of parameter sharing on channel number search. Besides, the transitionary sharing strategy is shown to achieve a better balance between efficient searching and architecture discrimination. Experimental results on both CIFAR-10 and Image Net datasets show that our approach outperforms a number of competitive counterparts.
Researcher Affiliation Collaboration Jiaxing Wang 1,3, Haoli Bai 2, Jiaxiang Wu4, Xupeng Shi5, Junzhou Huang4,6, Irwin King2, Michael Lyu2, Jian Cheng1,3 1NLPR, Institute of Automation, Chinese Academy of Sciences 2 The Chinese University of Hong Kong 3School of Artificial Intelligence, University of Chinese Academy of Sciences 4Tencent AI Lab 5Northeastern University 6University of Texas at Arlington
Pseudocode Yes An overall workflow is shown in Algorithm 1 of Appendix A.
Open Source Code Yes Code is available at https://github.com/haolibai/APS-channel-search.
Open Datasets Yes We conduct experiments on CIFAR-10 [17] and Image Net 2012 [15], following standard data preprocessing techniques in [9, 27].
Dataset Splits No The paper mentions evaluating on the "validation set" and "warm-up training," and references "standard data preprocessing techniques," but does not provide explicit percentages, sample counts, or citations for the training, validation, and test dataset splits to reproduce the data partitioning.
Hardware Specification Yes To be consistent with [6], the total searching epoch is set to 600, which can be finished within 6.9 hours for Res Net-20 and 8.6 hours for Res Net-56 on a single NVIDIA Tesla-P40. ... The whole searching process can be finished within 24 hours for Res Net-18 and 48 hours for Mobile Net-v2 on four NVIDIA Tesla-P40s.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiments. It implicitly refers to frameworks common in deep learning but lacks concrete version details.
Experiment Setup Yes A brief summarization of experimental setup is introduced below, while complete hyper-parameter settings and implementation details can be found in Appendix C. CIFAR-10 Experiments For CIFAR-10, we take Res Net [9] as base models similar to [6, 12]. To be consistent with [6], the total searching epoch is set to 600, which can be finished within 6.9 hours for Res Net-20 and 8.6 hours for Res Net-56 on a single NVIDIA Tesla-P40. The first 200 epochs are used for warm-up training with fixed P, Q, and candidate architectures are uniformly sampled from C. The rest 400 epochs are left for transition and training of the RL controller. We set C = {16, 32, 64, 96} for the analysis of parameter sharing in Section 5.2 and 100% FLOPs search, and C = {4, 8, 16, 32, 64} when searching for more compact model to compare to other baselines in Section 5.3. ... Image Net Experiments For Image Net experiments, we choose Res Net-18 and Mobile Net-v2 as base models. For memory efficiency, we increase candidate channels after each down-sampling layer according to default expansion rates of base models. The initial candidates C are set to {32, 48, 64, 80} for Res Net18 and {8, 12, 16, 20} for Mobile Net-v2 respectively. We search for 160 epochs where the first 80 epochs are for warm-up training.