EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis
Authors: Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically the effectiveness of the proposed method through extensive experiments. In particular, we highlight that the improvements are especially significant for more challenging datasets and networks. With negligible loss of accuracy, an iterative-pruning version gives a 10 reduction in model size and a 8 reduction in FLOPs on wide Res Net32. Our code is available at here. 5. Experiments In this section, we aim to verify the effectiveness of Eigen Damage in reducing the test-time resource requirements of a network without significantly sacrificing accuracy. We compare Eigen Damage with other compression methods in terms of test accuracy, reduction in weights, reduction in FLOPs, and inference wall-clock time speedup. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of Toronto, Toronto, Canada 2Vector Institute, Toronto, Canada 3NVIDIA. |
| Pseudocode | Yes | Algorithm 1 Pruning in the Kronecker-factored eigenbasis, i.e., Eigen Damage. For simplicity, we focus on a single layer. denotes elementwise mutliplication. |
| Open Source Code | Yes | Our code is available at here. |
| Open Datasets | Yes | We make use of three standard benchmark datasets: CIFAR10, CIFAR100 (Krizhevsky, 2009) and Tiny-Image Net4. |
| Dataset Splits | No | The paper mentions training and testing on standard benchmark datasets (CIFAR10, CIFAR100, Tiny-Image Net) but does not explicitly state the specific train/validation/test splits, their percentages, or sample counts, nor does it refer to a predefined standard split for all three parts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions training with SGD and various neural network architectures (VGGNet, ResNet) and methods (K-FAC, NN Slimming) but does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We train the networks for 150 epochs for CIFAR datasets and 300 epochs for Tiny Image Net with an initial learning rate of 0.1 and weight decay of 2e 4. The learning rate is decayed by a factor of 10 at 1/4 of the total number of training epochs. For the networks trained with L1 sparsity on Batch Norm, we followed the same settings as in Liu et al. (2017). After pruning, the network is finetuned for 150 epochs with an initial learning rate of 1e 3 and weight decay of 1e 4. |