Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

Authors: Wenling Shang, Kihyuk Sohn, Diogo Almeida, Honglak Lee

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We integrate CRe LU into several state-of-the-art CNN architectures and demonstrate improvement in their recognition performance on CIFAR-10/100 and Image Net datasets with fewer trainable parameters.
Researcher Affiliation Collaboration 1University of Michigan, Ann Arbor; 2NEC Laboratories America; 3Enlitic; 4Oculus VR
Pseudocode Yes We use a simple linear reconstruction algorithm (see Algorithm 1 in the supplementary materials) to reconstruct the original image from conv1-conv4 features (left to right).
Open Source Code No The paper does not contain an explicit statement about releasing source code or provide any links to a code repository.
Open Datasets Yes We evaluate the effectiveness of the CRe LU activation scheme on three benchmark datasets: CIFAR-10, CIFAR100 (Krizhevsky, 2009) and Image Net (Deng et al., 2009).
Dataset Splits Yes Since the datasets don t provide pre-defined validation set, we conduct two different cross-validation schemes: 1. Single : we hold out a subset of training set for initial training and retrain the network from scratch using the whole training set until we reach at the same loss on a hold out set (Goodfellow et al., 2013). 2. 10-folds: we divide training set into 10 folds and do validation on each of 10 folds while training the networks on the rest of 9 folds.
Hardware Specification No The paper only mentions "NVIDIA for the donation of GPUs" in the acknowledgments, which is too general and lacks specific model numbers or configurations required for reproducibility.
Software Dependencies No The paper does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) in its main text.
Experiment Setup Yes Note that the models with CRe LU activation don t need sig-nificant hyperparameter tuning from the baseline Re LU model, and in most of our experiments, we only tune dropout rate while other hyperparameters (e.g., learning rate, mini-batch size) remain the same. We also replace Re LU with AVR for comparison with CRe LU. [...] We subtract the mean and divide by the standard deviation for preprocessing and use random horizontal flip for data augmentation.