Deep Networks with Internal Selective Attention through Feedback Connections

Authors: Marijn F Stollenga, Jonathan Masci, Faustino Gomez, Jürgen Schmidhuber

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the CIFAR-10 and CIFAR-100 datasets, das Net outperforms the previous state-of-the-art model on unaugmented datasets.
Researcher Affiliation Academia IDSIA, USI-SUPSI Manno-Lugano, Switzerland {marijn,jonathan,tino,juergen}@idsia.ch
Pseudocode Yes Algorithm 1 TRAIN DASNET (M, µ, Σ, p, n)
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a direct link to a repository.
Open Datasets Yes On the CIFAR-10 and CIFAR-100 datasets, das Net outperforms the previous state-of-the-art model on unaugmented datasets. The CIFAR-10 dataset [20] is composed of 32 × 32 colour images split into 5 × 10^4 training and 10^4 testing samples, where each image is assigned to one of 10 classes. The CIFAR-100 is similarly composed, but contains 100 classes.
Dataset Splits No The paper mentions 5 * 10^4 training and 10^4 testing samples for CIFAR-10, but does not explicitly state a validation split.
Hardware Specification Yes Training took of das Net took around 4 days on a GTX 560 Ti GPU, excluding the original time used to train M.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes The number of steps was experimentally determined and fixed at T = 5; small enough to be computationally tractable while still allowing for enough interaction. In all experiments we set λcorrect = 0.005, λmisclassified = 1 and λL2 = 0.005. The Maxout network, M, was trained with data augmentation following global contrast normalization and ZCA normalization. The model consists of three convolutional maxout layers followed by a fully connected maxout and softmax outputs. Dropout of 0.5 was used in all layers except the input layer, and 0.2 for the input layer. The population size for SNES was set to 50.