BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
Authors: Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey, Gavin Gray
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of the chosen networks across CIFAR-10 and Image Net for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. |
| Researcher Affiliation | Academia | Jack Turner , Elliot J. Crowley , Michael O Boyle, Amos Storkey, Gavin Gray School of Informatics University of Edinburgh {jack.turner,elliot.j.crowley}@ed.ac.uk, mob@inf.ed.ac.uk, {a.storkey,g.d.b.gray}@ed.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Bayes Watch/pytorch-blockswap. |
| Open Datasets | Yes | Here, we evaluate student networks obtained using Block Swap on the CIFAR-10 image classification dataset (Krizhevsky, 2009). Here, we demonstrate that students chosen by Block Swap succeed on the more challenging Image Net dataset (Russakovsky et al., 2015). Thus far, we have used Block Swap for image classification problems. Here we observe whether it extends to object detection on the COCO dataset (Lin et al., 2014) specifically, training on 2017 train, and evaluating on 2017 val. |
| Dataset Splits | Yes | Here, we evaluate student networks obtained using Block Swap on the CIFAR-10 image classification dataset (Krizhevsky, 2009). Here, we demonstrate that students chosen by Block Swap succeed on the more challenging Image Net dataset (Russakovsky et al., 2015). Top-1 and top-5 validation errors are presented in Table 3. Thus far, we have used Block Swap for image classification problems. Here we observe whether it extends to object detection on the COCO dataset (Lin et al., 2014) specifically, training on 2017 train, and evaluating on 2017 val. |
| Hardware Specification | Yes | Block Swap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). By comparison, a Block Swap search for an 800K parameter network took less than 5 minutes using a single Titan X Pascal GPU. Minibatches of size 256 split across 4 GPUs are used with standard crop + flip augmentation. We use a batch-size of 16 split across 8 GPUs. |
| Software Dependencies | No | The paper mentions 'Torchvision' in the implementation details for COCO detection but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Implementation Details: Networks are trained for 200 epochs using SGD with momentum 0.9. The initial learning rate of 0.1 is cosine annealed (Loshchilov & Hutter, 2017) to zero across the training run. Minibatches of size 128 are used with standard crop + flip data augmentation and Cutout (De Vries & Taylor, 2017). The weight decay factor is set to 0.0005. For attention transfer β is set to 1000. |