DFPC: Data flow driven pruning of coupled channels without data.

Authors: Tanay Narshana, Chaitanya Murti, Chiranjib Bhattacharyya

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the efficacy of DFPC for models trained on standard datasets. Since we pruned coupled channels, we achieve up to 1.66x improvements in inference time for Res Net-101 trained on CIFAR-10 with a 5% accuracy drop without fine-tuning. With access to the Image Net training set, we achieve significant improvements over the data-free method and see an improvement of at least 47.1% in speedup for a 2.3% accuracy drop for Res Net-50 against our baselines.1
Researcher Affiliation Collaboration Observe.AI1 Robert Bosch Centre for Cyber-Physical Systems, Indian Institute of Science2 Department of Computer Science and Automation, Indian Institute of Science3
Pseudocode Yes Algorithm 1 BGSC: Backwards Graph based Saliency Computation
Open Source Code Yes Our code is publicly available at https://github.com/Tanay Narshana/DFPC-Pruning
Open Datasets Yes On Res Net-101 for the CIFAR-10 dataset, we obtain a 1.66x inference time speedup for a 5% accuracy drop without retraining. On Res Net-50 for the Image Net dataset, we obtain an inference time speedup of at least 47.1% against our baselines for a 2.3% accuracy drop with retraining.
Dataset Splits No The paper describes training procedures and hyperparameters but does not explicitly detail a validation dataset split or strategy.
Hardware Specification Yes Table 3: Specifications of CPU hardware used for inference time measurements CPU Model Name AMD EPYC 7763 64-Core CPU(s) 256 Thread(s)... The GPU is an NVIDIA 1080 Ti with CUDA 10.2 and a memory of 12GB.
Software Dependencies Yes The software stack used for inferencing consisted of Python 3.9.7, Py Torch 1.10.1, and Torchvision 0.11.2. ... The GPU is an NVIDIA 1080 Ti with CUDA 10.2 and a memory of 12GB.
Experiment Setup Yes We train the models using SGD Optimizer with a momentum factor of 0.9 and weight decay of 5e-4 for 200 epochs using Cosine Annealing step sizes with an initial learning rate of 0.1. ... We prune 1% of the remaining channels in each pruning iteration followed by a finetuning of 3 epochs, each with step sizes of 10-3, 10-4, 10-5 per pruning iteration. The batch size was 256. After the pruning ends, we finally prune the network for 90 epochs with a batch size of 512. We use the SGD Optimizer with a momentum factor of 0.9 and weight decay of 1e-4 and Cosine Annealed step sizes with an initial learning rate of 0.1.