Swapout: Learning an ensemble of deep architectures

Authors: Saurabh Singh, Derek Hoiem, David Forsyth

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment extensively on the CIFAR-10 dataset and demonstrate that a model trained with swapout outperforms a comparable Res Net model. Further, a 32 layer wider model matches the performance of a 1001 layer Res Net on both CIFAR-10 and CIFAR-100 datasets.
Researcher Affiliation Academia Department of Computer Science University of Illinois, Urbana-Champaign {ss1, dhoiem, daf}@illinois.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets Yes We experiment extensively on the CIFAR-10 dataset and demonstrate that a model trained with swapout outperforms a comparable Res Net model. Further, a 32 layer wider model matches the performance of a 1001 layer Res Net on both CIFAR-10 and CIFAR-100 datasets.
Dataset Splits No The paper mentions training on CIFAR-10 and CIFAR-100 and shows error rates, but it does not explicitly detail the training/validation/test splits, only that "Standard augmentation of left-right flips and random translations of up to four pixels is used." and "All the images in a mini-batch use the same crop."
Hardware Specification No The paper mentions
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Training: We train using SGD with a batch size of 128, momentum of 0.9 and weight decay of 0.0001. Unless otherwise specified, we train all the models for a total 256 epochs. Starting from an initial learning rate of 0.1, we drop it by a factor of 10 after 192 epochs and then again after 224 epochs.