Neuron Merging: Compensating for Pruned Neurons

Authors: Woojeong Kim, Suhyun Kim, Mincheol Park, Geunseok Jeon

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach over network pruning for various model architectures and datasets. As an example, for VGG-16 on CIFAR-10, we achieve an accuracy of 93.16% while reducing 64% of total parameters, without any fine-tuning. ... We evaluate the proposed approach with several popular models, which are Le Net [13], VGG [21], Res Net [4], and Wide Res Net [28], on Fashion MNIST [24], CIFAR [8], and Image Net1 [20] datasets.
Researcher Affiliation Academia Woojeong Kim Suhyun Kim Mincheol Park Geonseok Jeon Korea Institute of Science and Technology {kwj962004, dr.suhyun.kim, lotsberry, hotchya}@gmail.com
Pseudocode Yes Algorithm 1 Decomposition Algorithm; Algorithm 2 Most Sim Algorithm; Algorithm 3 Most Sim Algorithm with BN
Open Source Code Yes The code can be found here: https://github.com/friendshipkim/neuron-merging
Open Datasets Yes We evaluate the proposed approach with several popular models, which are Le Net [13], VGG [21], Res Net [4], and Wide Res Net [28], on Fashion MNIST [24], CIFAR [8], and Image Net1 [20] datasets.
Dataset Splits Yes For Le Net, the learning rate is reduced by one-tenth for every 15 of the total 60 epochs. Weight decay is set to 1e-4, and batch size to 128. For VGG and Res Net, the learning rate is reduced by one-tenth at 100 and 150 of the total 200 epochs. Weight decay is set to 5e-4, and batch size to 128. Weights are randomly initialized before the training. ... For CIFAR, we follow the setting in He et al. [6].
Hardware Specification No The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions using SGD for training, but does not provide specific software versions for libraries, frameworks, or programming languages (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes To train the baseline models, we employ SGD with the momentum of 0.9. The learning rate starts at 0.1, with different annealing strategies per model. For Le Net, the learning rate is reduced by one-tenth for every 15 of the total 60 epochs. Weight decay is set to 1e-4, and batch size to 128. For VGG and Res Net, the learning rate is reduced by one-tenth at 100 and 150 of the total 200 epochs. Weight decay is set to 5e-4, and batch size to 128. Weights are randomly initialized before the training.