Improving Deep Neural Network Sparsity through Decorrelation Regularization
Authors: Xiaotian Zhu, Wengang Zhou, Houqiang Li
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on CIFAR10/100 and ILSVRC2012 datasets show that when combined our decorrelation regularization with group LASSO, the correlation between filters could be effectively weakened, which increases the sparsity of the resulting model and leads to better compressing performance. |
| Researcher Affiliation | Academia | CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, EEIS Department, University of Science and Technology of China |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code. |
| Open Datasets | Yes | The experiments on CIFAR10/100 and ILSVRC2012 datasets show that when combined our decorrelation regularization with group LASSO, the correlation between filters could be effectively weakened, which increases the sparsity of the resulting model and leads to better compressing performance. |
| Dataset Splits | Yes | CIFAR is a medium scale image classification dataset introduced in [Krizhevsky and Hinton, 2009]. The dataset has 50000 images for training and 10000 images for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | For both architectures, the weight decay parameter λ is set to 0.0005, and the structured sparsity parameter η is 0.0015 for VGG-16, and 0.001 for Res Net-56... The decorrelation parameter γ is set to 5 and the sparsity threshold τ is set to 1e-4 according to grid search. We use stochastic gradient descent with momentum 0.9 for training. The initial learning rate is set to 0.1 and decays every 30 epochs with a factor of 0.5. |