Deep Learning with S-Shaped Rectified Linear Activation Units

Authors: Xiaojie Jin, Chunyan Xu, Jiashi Feng, Yunchao Wei, Junjun Xiong, Shuicheng Yan

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with two popular CNN architectures, Network in Network and Goog Le Net on scale-various benchmarks including CIFAR10, CIFAR100, MNIST and Image Net demonstrate that SRe LU achieves remarkable improvement compared to other activation functions.
Researcher Affiliation Collaboration 1NUS Graduate School for Integrative Science and Engineering, NUS 2Department of ECE, NUS 3Beijing Samsung Telecom R&D Center 4School of CSE, Nanjing University of Science and Technology
Pseudocode No The paper describes the mathematical formulations and update rules for SRe LU but does not provide a formal pseudocode or algorithm block.
Open Source Code Yes The codes of SRe LU are available at https://github.com/AIROBOTAI/caffe/tree/SRe LU.
Open Datasets Yes We conduct experiments on four datasets with different scales, including CIFAR-10, CIFAR-100 (Krizhevsky and Hinton 2009), MNIST (Le Cun et al. 1998) and a much larger dataset, Image Net (Deng et al. 2009)
Dataset Splits Yes For every dataset, we randomly sample 20% of the total training data as the validation set to configure the needed hyperparameters in different methods.
Hardware Specification Yes To reduce the training time, four NVIDIA TITAN GPUs are employed in parallel for training. Other hardware information of the PCs we use includes Intel Core i7 3.3GHz CPU, 64G RAM and 2T hard disk.
Software Dependencies No The paper states 'We choose Caffe (Jia et al. 2014) as the platform to conduct our experiments,' but it does not specify a version number for Caffe or any other software dependencies.
Experiment Setup Yes For the setting of hyperparameters (such as learning rate, weight decay and dropout ratio, etc.), we follow the published configurations of original networks. ... For SRe LU, we use al = 0.2 and k = 0.9 |Xi| for all datasets.