reproducibilityindex.ai

Shakeout: A New Regularized Deep Neural Network Training Scheme

Authors: Guoliang Kang, Jun Li, Dacheng Tao

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have empirically evaluated the Shakeout scheme and demonstrated that sparse network weights are obtained via Shakeout training. Our classiﬁcation experiments on real-life image datasets MNIST and CIFAR10 show that Shakeout deals with over-ﬁtting effectively.In this section, we report empirical evaluation of the Shakeout scheme in training deep neural networks on real-life datasets.
Researcher Affiliation	Academia	Guoliang Kang, Jun Li, Dacheng Tao Centre for Quantum Computation and Intelligent Systems Faculty of Engineering and Information Technology, University of Technology Sydney {Guoliang.Kang@student, Jun.Li@, Dacheng.Tao@}uts.edu.au
Pseudocode	No	No structured pseudocode or clearly labeled algorithm blocks were found. The paper describes the Shakeout operations in a textual step-by-step format with equations, but not as an algorithm block.
Open Source Code	No	No explicit statement providing access to the source code for the Shakeout methodology was found. The paper mentions: 'All the experiments are implemented based on the modiﬁcations of Caffe library (Jia et al. 2014).'
Open Datasets	Yes	The hand-written image dataset MNIST (Le Cun et al. 1998) and the CIFAR-10 image dataset (Krizhevsky and Hinton 2009)
Dataset Splits	Yes	MNIST consists of 60k+10k (training+testing) 28 28 images of hand-written digits. We separate 10,000 training samples from original training dataset for validation. CIFAR-10 contains 50k+10k (training +testing) 32 32 images of 10 object classes. In this experiment, 10,000 colour images are separated from the training dataset for validation.
Hardware Specification	No	No specific hardware details (like CPU/GPU models, memory, or cloud specifications) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers were provided. The paper only mentions: 'All the experiments are implemented based on the modiﬁcations of Caffe library (Jia et al. 2014).'
Experiment Setup	Yes	The autoencoder adopted contains one hidden layer of 256 units, each of which is connected to the 28 28 image pixels and followed by a hyperbolic tangent (i.e. tanh) activation function. ... Dropout(τ = 0.5), and Shakeout (τ = 0.5, c = {1, 10}). ... For the fully-connected neural network, a big hidden layer size is adopted with its value at 4096. The non-linear activation unit adopted is the rectiﬁer linear unit (Re LU). The deep convolutional neural network employed contains two convolutional layers and two fully connected layers. The detailed architecture information of this convolutional neural network is described in Tab. 1. ... We ﬁrst train for 100 epochs with an initial learning rate of 0.001 and then another 50 epochs with the learning rate of 0.0001. ... no data augmentation is utilized except that the per-pixel mean computed over the training set is subtracted from each image.