S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning
Authors: Weihao Lin, Shengji Tang, Chong Yu, Peng Ye, Tao Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we begin by validating the effectiveness of the proposed pruner using three benchmark datasets: CIFAR-100 [29], Tiny Image Net [11], and Image Net [11]. For CIFAR-100 and Tiny Image Net, we evaluate three common CNN architectures, i.e., Res Net-50 [20], Mobile Net V3 (MBV3) [24], and WRN28-10 [73], and two Transformer architectures, i.e., Vi T [61] and Swin Transformer [37], across various pruning ratios including 15%, 35%, and 55%. For Image Net, Res Net-50 serves as the backbone model, and we compare the proposed pruner with several structural pruning methods in terms of Top-1 accuracy and FLOPs. After the benchmarking, investigative experiments are performed on CIFAR-100 using Res Net-50 to elucidate the influence of each gradient term in Algorithm 1 and the gap-narrowing capacity of the proposed pruner. |
| Researcher Affiliation | Collaboration | Weihao Lin1 , Shengji Tang1 , Chong Yu2 , Peng Ye3 , Tao Chen1 1School of Information Science and Technology, Fudan University, Shanghai, China, 2Academy for Engineering and Technology, Fudan University, Shanghai, China, 3Shanghai AI Laboratory, Shanghai, China eetchen@fudan.edu.cn |
| Pseudocode | Yes | The pseudocode describing the whole training process can be referred to in Algorithm 1, and a visualization of the forward/backward passes is provided in Fig. 2. Algorithm 1: The training pseudo-code based on Pytorch automatic differentiation |
| Open Source Code | Yes | Codes are publicly available on Git Hub: https://github.com/opposj/S2HPruner. |
| Open Datasets | Yes | In this section, we begin by validating the effectiveness of the proposed pruner using three benchmark datasets: CIFAR-100 [29], Tiny Image Net [11], and Image Net [11]. |
| Dataset Splits | No | The paper mentions a "validation set" once when discussing gap metrics: "The gap metrics, i.e., the Jensen Shannon divergence (JS) and L2 distance, are averaged over the entire validation set." However, it does not specify the size, percentage, or specific splits of this validation set for reproducibility. |
| Hardware Specification | Yes | A cluster equipped with 8 NVIDIA A100 GPUs, 1024 GB memories, and 120 CPUs is used to run experiments. A single GPU is used for experiments on CIFAR-100 and Tiny Image Net. For Imagenet, four GPUs are paralleled to run the task. |
| Software Dependencies | Yes | All experiments are conducted under the deep learning framework Pytorch [48], versioned 2.0.1 with Python versioned 3.10. The CUDA version is 11.8. |
| Experiment Setup | Yes | Appendix A: Details of experiments In this section, we provide the detailed specific training settings in the main manuscript. All experiments are conducted under the deep learning framework Pytorch [48], versioned 2.0.1 with Python versioned 3.10. The CUDA version is 11.8. A cluster equipped with 8 NVIDIA A100 GPUs, 1024 GB memories, and 120 CPUs is used to run experiments. A single GPU is used for experiments on CIFAR-100 and Tiny Image Net. For Imagenet, four GPUs are paralleled to run the task. |