Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization
Authors: Yanyu Li, Pu Zhao, Geng Yuan, Xue Lin, Yanzhi Wang, Xin Chen
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our proposed architecture outperforms prior arts by around 1.0% top-1 accuracy under similar inference speed on Image Net1000 classification task. Furthermore, we demonstrate the effectiveness of our width search on complex tasks including instance segmentation and image translation. All experiments are conducted on Py Torch 1.7 using NVIDIA RTX TITAN and Ge Force RTX 2080Ti GPUs. |
| Researcher Affiliation | Collaboration | Yanyu Li1 , Pu Zhao1 , Geng Yuan1 , Xue Lin1 , Yanzhi Wang1 , Xin Chen2 1Northeastern University 2Intel Corp. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models are released. |
| Open Datasets | Yes | Image Net ILSVRC-2012 contains 1.2 million training images and 50k testing images. We evaluate the proposed method on instance segmentation with MS COCO dataset. Specifically, we demonstrate horse2zebra dataset in Tab. 4. |
| Dataset Splits | No | The paper mentions 'Image Net ILSVRC-2012 contains 1.2 million training images and 50k testing images' but does not provide specific details on the validation dataset split (e.g., percentages, sample counts, or explicit mention of a validation set beyond what might be standard for ImageNet). |
| Hardware Specification | Yes | All experiments are conducted on Py Torch 1.7 using NVIDIA RTX TITAN and Ge Force RTX 2080Ti GPUs. |
| Software Dependencies | Yes | All experiments are conducted on Py Torch 1.7 using NVIDIA RTX TITAN and Ge Force RTX 2080Ti GPUs. |
| Experiment Setup | Yes | Following standard data augmentation, we prune from pretrained model with weight decay setting to 3.05 10 5, momentum as 0.875. Learning rate is rewound to 0.4 for a total batch size of 1024 synchronized on 8 GPUs. We search for 10 epochs, which is enough for per-layer pruning policy convergence as shown in Fig. 4. Then we freeze the policy (parameters in DBC) and anneal learning rate by cosine schedule for 50 epochs to achieve final accuracy. Thus we use a total of 60 epochs to deliver the compact well-trained model. |