reproducibilityindex.ai

Multi-Bias Non-linear Activation in Deep Neural Networks

Authors: Hongyang Li, Wanli Ouyang, Xiaogang Wang

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed MBA module and compare with other state-of-the-arts on several benchmarks. The CIFAR-10 dataset (Krizhevsky et al., 2012) consists of 32 32 color images on 10 classes with 50,000 training images and 10,000 testing images.
Researcher Affiliation	Academia	Hongyang Li YANGLI@EE.CUHK.EDU.HK Wanli Ouyang WLOUYANG@EE.CUHK.EDU.HK Xiaogang Wang XGWANG@EE.CUHK.EDU.HK The Chinese University of Hong Kong
Pseudocode	No	The paper describes the model and processes using mathematical formulations and textual descriptions, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Finally, the implementation code is available at https://github.com/hli2020/caffe/tree/bias.
Open Datasets	Yes	The CIFAR-10 dataset (Krizhevsky et al., 2012) consists of 32 32 color images on 10 classes with 50,000 training images and 10,000 testing images. The CIFAR-100 dataset has the same size and format as CIFAR-10, but contains 100 classes, with only one tenth as many labeled examples per class. The SVHN (Netzer et al., 2011) dataset resembles MNIST and consists of color images of house numbers captured by Google street view.
Dataset Splits	Yes	We follow a similar split-up of the validation set from the training set as (Goodfellow et al., 2013), where one tenth of samples per class from the training set on CIFAR, and 400 plus 200 samples per class from the training and the extra set on SVHN, are selected to build a validation set.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'caffe' as part of the code repository URL, but it does not specify version numbers for Caffe or any other software dependencies.
Experiment Setup	Yes	Our baseline network has three stacks of convolutional layers with each stack containing three convolutional layers, resulting in a total number of nine layers. Each stack has [96-96-96], [128-128-128] and [256-256-512] ﬁlters, respectively. The kernel size is 3 and padded by 1 pixel on each side with stride 1 for all convolutional layers. At the end of each convolutional stack is a max-pooling operation with kernel and stride size of 2. The two fully connected layers have 2048 neurons each. We also apply dropout with ratio 0.5 after each fully connected layers. The ﬁnal layer is a softmax classiﬁcation layer. The optimal training hyperparameters are determined on each validation set. We set the momentum as 0.9 and the weight decay to be 0.005. The base learning rate is set to be 0.1, 0.1, 0.05, respectively. We drop the learning rate by 10% around every 40 epoches in a continuous exponential way and stop to decrease the learning rate until it reaches a minimum value (0.0001). ... We use the hyperparameter K = 4 for the MBA module and the mini-batch size of 100 for stochastic gradient descent. All the convolutional layers are initialized with Gaussian distribution with mean of zero and standard variation of 0.05 or 0.1.