reproducibilityindex.ai

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

Authors: Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey, Gavin Gray

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of the chosen networks across CIFAR-10 and Image Net for classiﬁcation, and COCO for detection, and provide a comprehensive ablation study of our approach.
Researcher Affiliation	Academia	Jack Turner , Elliot J. Crowley , Michael O Boyle, Amos Storkey, Gavin Gray School of Informatics University of Edinburgh {jack.turner,elliot.j.crowley}@ed.ac.uk, mob@inf.ed.ac.uk, {a.storkey,g.d.b.gray}@ed.ac.uk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Bayes Watch/pytorch-blockswap.
Open Datasets	Yes	Here, we evaluate student networks obtained using Block Swap on the CIFAR-10 image classiﬁcation dataset (Krizhevsky, 2009). Here, we demonstrate that students chosen by Block Swap succeed on the more challenging Image Net dataset (Russakovsky et al., 2015). Thus far, we have used Block Swap for image classiﬁcation problems. Here we observe whether it extends to object detection on the COCO dataset (Lin et al., 2014) speciﬁcally, training on 2017 train, and evaluating on 2017 val.
Dataset Splits	Yes	Here, we evaluate student networks obtained using Block Swap on the CIFAR-10 image classiﬁcation dataset (Krizhevsky, 2009). Here, we demonstrate that students chosen by Block Swap succeed on the more challenging Image Net dataset (Russakovsky et al., 2015). Top-1 and top-5 validation errors are presented in Table 3. Thus far, we have used Block Swap for image classiﬁcation problems. Here we observe whether it extends to object detection on the COCO dataset (Lin et al., 2014) speciﬁcally, training on 2017 train, and evaluating on 2017 val.
Hardware Specification	Yes	Block Swap quickly explores possible block conﬁgurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). By comparison, a Block Swap search for an 800K parameter network took less than 5 minutes using a single Titan X Pascal GPU. Minibatches of size 256 split across 4 GPUs are used with standard crop + ﬂip augmentation. We use a batch-size of 16 split across 8 GPUs.
Software Dependencies	No	The paper mentions 'Torchvision' in the implementation details for COCO detection but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Implementation Details: Networks are trained for 200 epochs using SGD with momentum 0.9. The initial learning rate of 0.1 is cosine annealed (Loshchilov & Hutter, 2017) to zero across the training run. Minibatches of size 128 are used with standard crop + ﬂip data augmentation and Cutout (De Vries & Taylor, 2017). The weight decay factor is set to 0.0005. For attention transfer β is set to 1000.