Locally Free Weight Sharing for Network Width Search
Authors: Xiu Su, Shan You, Tao Huang, Fei Wang, Chen Qian, Changshui Zhang, Chang Xu
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on three datasets: CIFAR-100 [28], ImageNet [6], and ProxylessNAS [5] search space to evaluate our proposed method. We compare our method with several state-of-the-art NAS methods. We evaluate searched architectures on CIFAR-100 and ImageNet. |
| Researcher Affiliation | Collaboration | Xiangxiang Chu1, Xiaoxue Zang1, Jianchao Tan1, Yunhe Wang1,2, Zhaoxiang Zhang1,2, Jiancheng Li1, Yingqi Gao1, Shuchang Zhou1 1Megvii Research, 2Institute of Automation, Chinese Academy of Sciences (CASIA), University of Chinese Academy of Sciences (UCAS) |
| Pseudocode | Yes | The paper includes 'Algorithm 1 Overall Training Procedure' on page 3, which presents a structured algorithm block. |
| Open Source Code | No | The paper states: 'The code will be available at https://github.com/megvii-research/LFWS.' The use of 'will be' indicates future availability, not concrete access at the time of publication. |
| Open Datasets | Yes | We conduct experiments on three datasets: CIFAR-100 [28], ImageNet [6], and ProxylessNAS [5] search space to evaluate our proposed method. |
| Dataset Splits | Yes | For CIFAR-100, we follow the common practice [25, 5] to use 50k images for training and 10k images for testing. We randomly sample 5k images from the training set as validation set for width search. For ImageNet, ImageNet is a large-scale dataset with 1.28M training images and 50k validation images. In our search experiments, we sample 5k images from the training set for validation, and use 50k images in the validation set for testing. |
| Hardware Specification | Yes | Our width search takes 0.6 GPU days with 8 NVIDIA V100 GPUs for ResNet-18 on ImageNet. |
| Software Dependencies | No | The paper states, 'We implement our method based on PyTorch.' However, it does not specify any version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We use SGD optimizer with momentum 0.9. For ResNet-18 on CIFAR-100, the weight decay is 5e-4. The initial learning rate is 0.1 and decays by 0.1 at 80 and 120 epochs. The total epochs are 160. The batch size is 256. For ResNet-18 on ImageNet, we use 100 training epochs, initial learning rate is 0.05, weight decay is 3e-5, and batch size is 512. |