Dynamic Capacity Networks

Authors: Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, Aaron Courville

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We focus our empirical evaluation on the Cluttered MNIST and SVHN image datasets. Our findings indicate that DCNs are able to drastically reduce the number of computations, compared to traditional convolutional neural networks, while maintaining similar or even better performance.
Researcher Affiliation Collaboration Amjad Almahairi AMJAD.ALMAHAIRI@UMONTREAL.CA Nicolas Ballas NICOLAS.BALLAS@UMONTREAL.CA Tim Cooijmans TIM.COOIJMANS@UMONTREAL.CA Yin Zheng YIN.ZHENG@HULU.COM Hugo Larochelle HLAROCHELLE@TWITTER.COM Aaron Courville AARON.COURVILLE@UMONTREAL.CA MILA, Universit e de Montr eal, Qu ebec, Canada Hulu LLC. Beijing, China Twitter, Cambridge, MA, USA
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We focus our empirical evaluation on the Cluttered MNIST and SVHN image datasets." and "We use the 100 100 Cluttered MNIST digit classification dataset (Mnih et al., 2014)." and "transcribing multi-digit sequences from natural images using the Street View House Numbers (SVHN) dataset (Netzer et al., 2011)."
Dataset Splits Yes The dataset has the same size of MNIST: 60000 images for training and 10000 for testing." (Cluttered MNIST) and "The dataset has three subsets: train (33k), extra (202k) and test (13k). In the following, we trained our models on 230k images from both the train and extra subsets, where we take a 5k random sample as a validation set for choosing hyperparameters." (SVHN)
Hardware Specification Yes We evaluate all models on an NVIDIA Titan Black GPU card.
Software Dependencies No We use Batch Normalization (Ioffe & Szegedy, 2015) and Adam (Kingma & Ba, 2014) for training our models." and "The authors would like to thank the developers of Theano (Bergstra et al., 2011; Bastien et al., 2012) and Blocks/Fuel (Van Merri enboer et al., 2015) for developing such powerful tools for scientific computing". Specific version numbers for these software components are not provided.
Experiment Setup Yes Coarse layers: 2 convolutional layers, with 7 7 and 3 3 filter sizes, 12 and 24 filters, respectively, and a 2 2 stride. ... Fine layers: 5 convolutional layers, each with 3 3 filter sizes, 1 1 strides, and 24 filters. ... We use rectifier non-linearities in all layers. We use Batch Normalization (Ioffe & Szegedy, 2015) and Adam (Kingma & Ba, 2014) for training our models." (Cluttered MNIST) and "While training, we take 54 110 random crop from images, and we use 0.2 dropout on convolutional layers and 0.5 dropout on fully connected layers." (SVHN)