Deep Learning with S-Shaped Rectified Linear Activation Units
Authors: Xiaojie Jin, Chunyan Xu, Jiashi Feng, Yunchao Wei, Junjun Xiong, Shuicheng Yan
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with two popular CNN architectures, Network in Network and Goog Le Net on scale-various benchmarks including CIFAR10, CIFAR100, MNIST and Image Net demonstrate that SRe LU achieves remarkable improvement compared to other activation functions. |
| Researcher Affiliation | Collaboration | 1NUS Graduate School for Integrative Science and Engineering, NUS 2Department of ECE, NUS 3Beijing Samsung Telecom R&D Center 4School of CSE, Nanjing University of Science and Technology |
| Pseudocode | No | The paper describes the mathematical formulations and update rules for SRe LU but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The codes of SRe LU are available at https://github.com/AIROBOTAI/caffe/tree/SRe LU. |
| Open Datasets | Yes | We conduct experiments on four datasets with different scales, including CIFAR-10, CIFAR-100 (Krizhevsky and Hinton 2009), MNIST (Le Cun et al. 1998) and a much larger dataset, Image Net (Deng et al. 2009) |
| Dataset Splits | Yes | For every dataset, we randomly sample 20% of the total training data as the validation set to configure the needed hyperparameters in different methods. |
| Hardware Specification | Yes | To reduce the training time, four NVIDIA TITAN GPUs are employed in parallel for training. Other hardware information of the PCs we use includes Intel Core i7 3.3GHz CPU, 64G RAM and 2T hard disk. |
| Software Dependencies | No | The paper states 'We choose Caffe (Jia et al. 2014) as the platform to conduct our experiments,' but it does not specify a version number for Caffe or any other software dependencies. |
| Experiment Setup | Yes | For the setting of hyperparameters (such as learning rate, weight decay and dropout ratio, etc.), we follow the published configurations of original networks. ... For SRe LU, we use al = 0.2 and k = 0.9 |Xi| for all datasets. |