Learning the Number of Neurons in Deep Networks
Authors: Jose M. Alvarez, Mathieu Salzmann
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the ability of our method to automatically determine the number of neurons on the task of large-scale classification. To this end, we study three different architectures and analyze the behavior of our method on three different datasets, with a particular focus on parameter reduction. Below, we first describe our experimental setup and then discuss our results. |
| Researcher Affiliation | Academia | Jose M. Alvarez Data61 @ CSIRO Canberra, ACT 2601, Australia jose.alvarez@data61.csiro.au Mathieu Salzmann CVLab, EPFL CH-1015 Lausanne, Switzerland mathieu.salzmann@epfl.ch |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | For our experiments, we used two large-scale image classification datasets, Image Net [Russakovsky et al., 2015] and Places2-401 [Zhou et al., 2015]. Furthermore, we conducted additional experiments on the character recognition dataset of [Jaderberg et al., 2014a]. |
| Dataset Splits | Yes | We used the ILSVRC-2012 [Russakovsky et al., 2015] subset consisting of 1000 categories, with 1.2 million training images and 50,000 validation images. Finally, the ICDAR character recognition dataset of [Jaderberg et al., 2014a] consists of 185,639 training and 5,198 test samples split into 36 categories. |
| Hardware Specification | Yes | More specifically, for Image Net and Places2-401, we used the torch-7 multi-gpu framework [Collobert et al., 2011] on a Dual Xeon 8-core E5-2650 with 128GB of RAM using three Kepler Tesla K20m GPUs in parallel. All models were trained for a total of 55 epochs with 12, 000 batches per epoch and a batch size of 48 and 180 for BNet and Dec8, respectively. For ICDAR, we trained each network on a single Tesla K20m GPU for a total 45 epochs with a batch size of 256 and 1,000 iterations per epoch. |
| Software Dependencies | No | The paper mentions using the 'torch-7 multi-gpu framework [Collobert et al., 2011]' but does not provide a specific version number for Torch or any other software libraries or dependencies. |
| Experiment Setup | Yes | All models were trained for a total of 55 epochs with 12, 000 batches per epoch and a batch size of 48 and 180 for BNet and Dec8, respectively. The learning rate was set to an initial value of 0.01 and then multiplied by 0.1. Data augmentation was done through random crops and random horizontal flips with probability 0.5. For ICDAR, we trained each network on a single Tesla K20m GPU for a total 45 epochs with a batch size of 256 and 1,000 iterations per epoch. In this case, the learning rate was set to an initial value of 0.1 and multiplied by 0.1 in the second, seventh and fifteenth epochs. We used a momentum of 0.9. In terms of hyper-parameters, for large-scale classification, we used λl = 0.102 for the first three layers and λl = 0.255 for the remaining ones. For ICDAR, we used λl = 5.1 for the first layer and λl = 10.2 for the remaining ones. |