Nonparametrically Learning Activation Functions in Deep Neural Nets
Authors: Carson Eisenach, Zhaoran Wang, Han Liu
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the power of our novel techniques, we test them on image recognition datasets and achieve up to a 15% relative increase in test performance compared to the baseline. |
| Researcher Affiliation | Academia | Carson Eisenach Princeton University eisenach@princeton.edu Han Liu Princeton University hanliu@princeton.edu Zhaoran Wang Princeton University zhaoran@princeton.edu |
| Pseudocode | Yes | Algorithm 1 Generic Two Stage Training for Deep Convolutional Neural Networks |
| Open Source Code | No | The paper mentions using open-source libraries like Theano, CUDA, and CuDNN but does not provide a link or explicit statement about the availability of their own implementation code. |
| Open Datasets | Yes | This dataset consists of 60,000 training and 10,000 testing images. ... The dataset is from Lecun & Cortes (1999). (for MNIST) and The CIFAR-10 dataset is due to Krizhevsky (2009). This dataset consists of 50,000 training and 10,000 test images. |
| Dataset Splits | No | The paper explicitly states the number of training and testing images for MNIST and CIFAR-10, but it does not specify a separate validation dataset split. |
| Hardware Specification | Yes | We ran our simulations on Princeton s SMILE server and TIGER computing cluster. The SMILE server was equipped with a single Tesla K40c GPU, while the TIGER cluster has 200 K20 GPUs. |
| Software Dependencies | No | The paper mentions using 'Theano (Bergstra et al., 2010)', 'CUDA (John Nickolls, 2008)', and 'Cu DNN (Chetlur et al., 2014)', but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | A mini-batch size of 250 was used for all experiments. For dropout nets we follow Srivastava et al. (2014) and use a dropout of 0.9 on the input, 0.75 in the convolution layers and 0.5 in the fully connected layers. |