reproducibilityindex.ai

Neuron Merging: Compensating for Pruned Neurons

Authors: Woojeong Kim, Suhyun Kim, Mincheol Park, Geunseok Jeon

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach over network pruning for various model architectures and datasets. As an example, for VGG-16 on CIFAR-10, we achieve an accuracy of 93.16% while reducing 64% of total parameters, without any ﬁne-tuning. ... We evaluate the proposed approach with several popular models, which are Le Net [13], VGG [21], Res Net [4], and Wide Res Net [28], on Fashion MNIST [24], CIFAR [8], and Image Net1 [20] datasets.
Researcher Affiliation	Academia	Woojeong Kim Suhyun Kim Mincheol Park Geonseok Jeon Korea Institute of Science and Technology {kwj962004, dr.suhyun.kim, lotsberry, hotchya}@gmail.com
Pseudocode	Yes	Algorithm 1 Decomposition Algorithm; Algorithm 2 Most Sim Algorithm; Algorithm 3 Most Sim Algorithm with BN
Open Source Code	Yes	The code can be found here: https://github.com/friendshipkim/neuron-merging
Open Datasets	Yes	We evaluate the proposed approach with several popular models, which are Le Net [13], VGG [21], Res Net [4], and Wide Res Net [28], on Fashion MNIST [24], CIFAR [8], and Image Net1 [20] datasets.
Dataset Splits	Yes	For Le Net, the learning rate is reduced by one-tenth for every 15 of the total 60 epochs. Weight decay is set to 1e-4, and batch size to 128. For VGG and Res Net, the learning rate is reduced by one-tenth at 100 and 150 of the total 200 epochs. Weight decay is set to 5e-4, and batch size to 128. Weights are randomly initialized before the training. ... For CIFAR, we follow the setting in He et al. [6].
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions using SGD for training, but does not provide specific software versions for libraries, frameworks, or programming languages (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup	Yes	To train the baseline models, we employ SGD with the momentum of 0.9. The learning rate starts at 0.1, with different annealing strategies per model. For Le Net, the learning rate is reduced by one-tenth for every 15 of the total 60 epochs. Weight decay is set to 1e-4, and batch size to 128. For VGG and Res Net, the learning rate is reduced by one-tenth at 100 and 150 of the total 200 epochs. Weight decay is set to 5e-4, and batch size to 128. Weights are randomly initialized before the training.