Deep Networks with Internal Selective Attention through Feedback Connections
Authors: Marijn F Stollenga, Jonathan Masci, Faustino Gomez, Jürgen Schmidhuber
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the CIFAR-10 and CIFAR-100 datasets, das Net outperforms the previous state-of-the-art model on unaugmented datasets. |
| Researcher Affiliation | Academia | IDSIA, USI-SUPSI Manno-Lugano, Switzerland {marijn,jonathan,tino,juergen}@idsia.ch |
| Pseudocode | Yes | Algorithm 1 TRAIN DASNET (M, µ, Σ, p, n) |
| Open Source Code | No | The paper does not provide an explicit statement about the release of its source code or a direct link to a repository. |
| Open Datasets | Yes | On the CIFAR-10 and CIFAR-100 datasets, das Net outperforms the previous state-of-the-art model on unaugmented datasets. The CIFAR-10 dataset [20] is composed of 32 × 32 colour images split into 5 × 10^4 training and 10^4 testing samples, where each image is assigned to one of 10 classes. The CIFAR-100 is similarly composed, but contains 100 classes. |
| Dataset Splits | No | The paper mentions 5 * 10^4 training and 10^4 testing samples for CIFAR-10, but does not explicitly state a validation split. |
| Hardware Specification | Yes | Training took of das Net took around 4 days on a GTX 560 Ti GPU, excluding the original time used to train M. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The number of steps was experimentally determined and fixed at T = 5; small enough to be computationally tractable while still allowing for enough interaction. In all experiments we set λcorrect = 0.005, λmisclassified = 1 and λL2 = 0.005. The Maxout network, M, was trained with data augmentation following global contrast normalization and ZCA normalization. The model consists of three convolutional maxout layers followed by a fully connected maxout and softmax outputs. Dropout of 0.5 was used in all layers except the input layer, and 0.2 for the input layer. The population size for SNES was set to 50. |