Shakeout: A New Regularized Deep Neural Network Training Scheme

Authors: Guoliang Kang, Jun Li, Dacheng Tao

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have empirically evaluated the Shakeout scheme and demonstrated that sparse network weights are obtained via Shakeout training. Our classification experiments on real-life image datasets MNIST and CIFAR10 show that Shakeout deals with over-fitting effectively.In this section, we report empirical evaluation of the Shakeout scheme in training deep neural networks on real-life datasets.
Researcher Affiliation Academia Guoliang Kang, Jun Li, Dacheng Tao Centre for Quantum Computation and Intelligent Systems Faculty of Engineering and Information Technology, University of Technology Sydney {Guoliang.Kang@student, Jun.Li@, Dacheng.Tao@}uts.edu.au
Pseudocode No No structured pseudocode or clearly labeled algorithm blocks were found. The paper describes the Shakeout operations in a textual step-by-step format with equations, but not as an algorithm block.
Open Source Code No No explicit statement providing access to the source code for the Shakeout methodology was found. The paper mentions: 'All the experiments are implemented based on the modifications of Caffe library (Jia et al. 2014).'
Open Datasets Yes The hand-written image dataset MNIST (Le Cun et al. 1998) and the CIFAR-10 image dataset (Krizhevsky and Hinton 2009)
Dataset Splits Yes MNIST consists of 60k+10k (training+testing) 28 28 images of hand-written digits. We separate 10,000 training samples from original training dataset for validation. CIFAR-10 contains 50k+10k (training +testing) 32 32 images of 10 object classes. In this experiment, 10,000 colour images are separated from the training dataset for validation.
Hardware Specification No No specific hardware details (like CPU/GPU models, memory, or cloud specifications) used for running experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers were provided. The paper only mentions: 'All the experiments are implemented based on the modifications of Caffe library (Jia et al. 2014).'
Experiment Setup Yes The autoencoder adopted contains one hidden layer of 256 units, each of which is connected to the 28 28 image pixels and followed by a hyperbolic tangent (i.e. tanh) activation function. ... Dropout(τ = 0.5), and Shakeout (τ = 0.5, c = {1, 10}). ... For the fully-connected neural network, a big hidden layer size is adopted with its value at 4096. The non-linear activation unit adopted is the rectifier linear unit (Re LU). The deep convolutional neural network employed contains two convolutional layers and two fully connected layers. The detailed architecture information of this convolutional neural network is described in Tab. 1. ... We first train for 100 epochs with an initial learning rate of 0.001 and then another 50 epochs with the learning rate of 0.0001. ... no data augmentation is utilized except that the per-pixel mean computed over the training set is subtracted from each image.