Combined Group and Exclusive Sparsity for Deep Neural Networks

Authors: Jaehong Yoon, Sung Ju Hwang

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method on multiple public datasets, and the results show that our method can obtain more compact and efficient networks while also improving the performance over the base networks with full weights, as opposed to existing sparsity regularizations that often obtain efficiency at the expense of prediction accuracy. We validate our regularized network on four public datasets with different base networks, on which it achieves a compact, lighter model while achieving superior performance over networks trained with other sparsity-inducing regularizers, sometimes obtaining even better accuracy than the full model. We perform all experiments with convolutional neural network as the base network model.
Researcher Affiliation Collaboration 1UNIST, Ulsan, South Korea 2AITrics, Seoul, South Korea.
Pseudocode Yes Algorithm 1 Stochastic Proximal Gradient Algorithm for Combined (2,1)and (1,2)regularization
Open Source Code Yes 2Codes available at https://github.com/jaehong-yoon93/CGES
Open Datasets Yes We validate our method on four public datasets for classification... 1) MNIST. This dataset contains 70, 000 28 28 grayscale images of handwritten digits for training example images... 2) CIFAR-10. This dataset consists of 60, 000 images sized 32 32... 3) CIFAR-100. This dataset also consists of 60, 000 images of 32 32 pixels as in CIFAR-10... 4) Image Net-1K. This is the dataset for 2012 Image Net Large Scale Visual Recognition Challange (Deng et al., 2009)
Dataset Splits Yes MNIST. This dataset contains 70, 000 28 28 grayscale images of handwritten digits for training example images, where there is 6, 000 training instances and 1, 000 test instances per class. CIFAR-10. For each class, there are 5, 000 images for training and 1, 000 images for test. CIFAR-100. For each class, 500 images are used for training and 100 images are used for test. Image Net-1K... For evaluation, we used the validation set that consists of 50, 000 images, following the standard procedure.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'Tensorflow' but does not specify its version number, nor does it list other software dependencies with version numbers.
Experiment Setup No The paper describes general training strategies ('train all networks from the scratch', 'fine-tune the network'), but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific optimizer settings needed to reproduce the experimental setup.