Swapout: Learning an ensemble of deep architectures
Authors: Saurabh Singh, Derek Hoiem, David Forsyth
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment extensively on the CIFAR-10 dataset and demonstrate that a model trained with swapout outperforms a comparable Res Net model. Further, a 32 layer wider model matches the performance of a 1001 layer Res Net on both CIFAR-10 and CIFAR-100 datasets. |
| Researcher Affiliation | Academia | Department of Computer Science University of Illinois, Urbana-Champaign {ss1, dhoiem, daf}@illinois.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | We experiment extensively on the CIFAR-10 dataset and demonstrate that a model trained with swapout outperforms a comparable Res Net model. Further, a 32 layer wider model matches the performance of a 1001 layer Res Net on both CIFAR-10 and CIFAR-100 datasets. |
| Dataset Splits | No | The paper mentions training on CIFAR-10 and CIFAR-100 and shows error rates, but it does not explicitly detail the training/validation/test splits, only that "Standard augmentation of left-right flips and random translations of up to four pixels is used." and "All the images in a mini-batch use the same crop." |
| Hardware Specification | No | The paper mentions |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Training: We train using SGD with a batch size of 128, momentum of 0.9 and weight decay of 0.0001. Unless otherwise specified, we train all the models for a total 256 epochs. Starting from an initial learning rate of 0.1, we drop it by a factor of 10 after 192 epochs and then again after 224 epochs. |