DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism

Authors: Cheonjun Park, Mincheol Park, Hyunchan Moon, Myung Kuk Yoon, Seokjin Go, Suhyun Kim, Won Woo Ro

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results show that DEPrune achieves up to 3.74 practical speedup in DSConv inference on GPUs while maintaining the accuracy of Efficient Net-B0 on Image Net. We assess the effectiveness of DEPrune using Image Net [8] and CIFAR-10 [25]. For the validation of image classification, we assess our method with CNN models using DSConv: Mobile Net-V2 [43], Efficient Net-B0 [45], and Mobile Net-V3 [22].
Researcher Affiliation Collaboration 1 Samsung Electronics 2 Yonsei University 3 Korea Institute of Science and Technology 4 LG Electronics 5 Ewha Womans University 6 Georgia Institute of Technology
Pseudocode No The paper describes methods through text and diagrams (e.g., Figure 4, 5, 7) but does not include explicit pseudocode or algorithm blocks labeled "Pseudocode" or "Algorithm".
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] . Justification: We are going to open source the code later.
Open Datasets Yes We assess the effectiveness of DEPrune using Image Net [8] and CIFAR-10 [25].
Dataset Splits No The paper states it uses Image Net and CIFAR-10, but does not explicitly specify exact train/validation/test dataset split percentages or sample counts, nor does it refer to predefined splits with citations for these specific splits.
Hardware Specification Yes We evaluate DEPrune using NVIDIA RTX 2080 Ti GPUs [1].
Software Dependencies No The paper mentions using "Pytorch framework [39]" and "NVIDIA CUTLASS [24]" but does not provide specific version numbers for these software components.
Experiment Setup Yes We perform fine-tuning with only 65 epochs after conducting pruning methods. We set a batch size of 256. We use SGD optimizer with the weight decay, 1 10 4, and the momentum as 0.9 for fine-tuning. The initial learning rate is set to 0.001 and divided by 10 every 30 epoch.