Neuron Merging: Compensating for Pruned Neurons
Authors: Woojeong Kim, Suhyun Kim, Mincheol Park, Geunseok Jeon
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach over network pruning for various model architectures and datasets. As an example, for VGG-16 on CIFAR-10, we achieve an accuracy of 93.16% while reducing 64% of total parameters, without any fine-tuning. ... We evaluate the proposed approach with several popular models, which are Le Net [13], VGG [21], Res Net [4], and Wide Res Net [28], on Fashion MNIST [24], CIFAR [8], and Image Net1 [20] datasets. |
| Researcher Affiliation | Academia | Woojeong Kim Suhyun Kim Mincheol Park Geonseok Jeon Korea Institute of Science and Technology {kwj962004, dr.suhyun.kim, lotsberry, hotchya}@gmail.com |
| Pseudocode | Yes | Algorithm 1 Decomposition Algorithm; Algorithm 2 Most Sim Algorithm; Algorithm 3 Most Sim Algorithm with BN |
| Open Source Code | Yes | The code can be found here: https://github.com/friendshipkim/neuron-merging |
| Open Datasets | Yes | We evaluate the proposed approach with several popular models, which are Le Net [13], VGG [21], Res Net [4], and Wide Res Net [28], on Fashion MNIST [24], CIFAR [8], and Image Net1 [20] datasets. |
| Dataset Splits | Yes | For Le Net, the learning rate is reduced by one-tenth for every 15 of the total 60 epochs. Weight decay is set to 1e-4, and batch size to 128. For VGG and Res Net, the learning rate is reduced by one-tenth at 100 and 150 of the total 200 epochs. Weight decay is set to 5e-4, and batch size to 128. Weights are randomly initialized before the training. ... For CIFAR, we follow the setting in He et al. [6]. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions using SGD for training, but does not provide specific software versions for libraries, frameworks, or programming languages (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | To train the baseline models, we employ SGD with the momentum of 0.9. The learning rate starts at 0.1, with different annealing strategies per model. For Le Net, the learning rate is reduced by one-tenth for every 15 of the total 60 epochs. Weight decay is set to 1e-4, and batch size to 128. For VGG and Res Net, the learning rate is reduced by one-tenth at 100 and 150 of the total 200 epochs. Weight decay is set to 5e-4, and batch size to 128. Weights are randomly initialized before the training. |