Revisit Kernel Pruning with Lottery Regulated Grouped Convolutions

Authors: Shaochen Zhong, Guanqun Zhang, Ningjia Huang, Shuai Xu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments also demonstrate our method often outperforms comparable SOTA methods with lesser data augmentation needed, smaller finetuning budget required, and sometimes even much simpler procedure executed (e.g., one-shot v. iterative). 4 EXPERIMENTS AND RESULTS
Researcher Affiliation Academia Shaochen (Henry) Zhong1, Guanqun Zhang2, Ningjia Huang1, and Shuai Xu1 1Department of Computer and Data Sciences, Case Western Reserve University 1{sxz517, nxh239, sxx214}@case.edu 2Center for Combinatorics, Nankai University 2zhanggq1994@mail.nankai.edu.cn
Pseudocode Yes A.2.1 GREEDY GROUPED KERNEL PRUNING PROCEDURE Algorithm 1 Generate Cℓ in grouped kernel pruning strategies
Open Source Code Yes Please refer to our Git Hub repository for code. As we advocate our proposed framework is able to shine a new light on kernel pruning under the context of densely structured pruning, we have prepared a Git Hub repository with checkpoints placed on every stage of our method.
Open Datasets Yes For datasets, we choose CIFAR-10 (Krizhevsky, 2009), Tiny-Image Net (Wu et al., 2017), and Image Net (ILSVRC-12) (Deng et al., 2009)
Dataset Splits No The paper mentions training and testing on datasets like CIFAR-10 and ImageNet, but does not explicitly provide details about specific training/validation/test dataset splits (percentages or counts) or refer to a specific predefined validation split used.
Hardware Specification Yes The following experiments are conducted on a 2.00GHz 4 core Intel Xeon CPU and Tesla V100.
Software Dependencies No The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes For all experiments done on CIFAR-10 and Tiny-Image Net, we train the baseline models for 300 epochs with the learning rate starting at 0.1 and dividing by 10 per every 100 epochs. The baseline model is trained using SGD with a weight-decay set to 5e-4, momentum set to 0.9, and a batch-size of 64. All data are augmented with random crop and randomly horizontal flip. For the experiments done on Image Net, we train the Res Net-50 model for 90 epochs with the weight-decay set to 1e-4 and learning rate dividing by 10 per every 30 epochs (while keeping all other settings the same as CIFAR-10 and Tiny-Image Net experiments). Our pruning settings are largely identical to our training settings except for the learning rate, which is set to 0.01 at the start.