DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism
Authors: Cheonjun Park, Mincheol Park, Hyunchan Moon, Myung Kuk Yoon, Seokjin Go, Suhyun Kim, Won Woo Ro
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results show that DEPrune achieves up to 3.74 practical speedup in DSConv inference on GPUs while maintaining the accuracy of Efficient Net-B0 on Image Net. We assess the effectiveness of DEPrune using Image Net [8] and CIFAR-10 [25]. For the validation of image classification, we assess our method with CNN models using DSConv: Mobile Net-V2 [43], Efficient Net-B0 [45], and Mobile Net-V3 [22]. |
| Researcher Affiliation | Collaboration | 1 Samsung Electronics 2 Yonsei University 3 Korea Institute of Science and Technology 4 LG Electronics 5 Ewha Womans University 6 Georgia Institute of Technology |
| Pseudocode | No | The paper describes methods through text and diagrams (e.g., Figure 4, 5, 7) but does not include explicit pseudocode or algorithm blocks labeled "Pseudocode" or "Algorithm". |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] . Justification: We are going to open source the code later. |
| Open Datasets | Yes | We assess the effectiveness of DEPrune using Image Net [8] and CIFAR-10 [25]. |
| Dataset Splits | No | The paper states it uses Image Net and CIFAR-10, but does not explicitly specify exact train/validation/test dataset split percentages or sample counts, nor does it refer to predefined splits with citations for these specific splits. |
| Hardware Specification | Yes | We evaluate DEPrune using NVIDIA RTX 2080 Ti GPUs [1]. |
| Software Dependencies | No | The paper mentions using "Pytorch framework [39]" and "NVIDIA CUTLASS [24]" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We perform fine-tuning with only 65 epochs after conducting pruning methods. We set a batch size of 256. We use SGD optimizer with the weight decay, 1 10 4, and the momentum as 0.9 for fine-tuning. The initial learning rate is set to 0.001 and divided by 10 every 30 epoch. |