reproducibilityindex.ai

StrassenNets: Deep Learning with a Multiplication Budget

Authors: Michael Tschannen, Aran Khanna, Animashree Anandkumar

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on CIFAR-10 and Image Net show that our method applied to Res Net (He et al., 2016a) yields the same or higher accuracy than existing complexity reduction methods while using considerably fewer multiplications. For example, for Res Net18 our method reduces the number of multiplications by 99.63% while incurring a top-1 accuracy degradation of only 2.0% compared to the full-precision model on Image Net.
Researcher Affiliation	Collaboration	1ETH Zürich, Zürich, Switzerland (most of this work was done while MT was at Amazon AI) 2Amazon AI, Palo Alto, CA, USA 3Caltech, Pasadena, CA, USA.
Pseudocode	Yes	see Fig. 1, right, and pseudocode in Appendix C.
Open Source Code	Yes	Code available at https://github.com/mitscha/ strassennets.
Open Datasets	Yes	We apply our method to all convolution layers... of the Res Net architecture (He et al., 2016a) to create the so-called Strassen-Res Net (ST-Res Net). We evaluate ST-Res Net on CIFAR-10 (10 classes, 50k training images, 10k testing images) (Krizhevsky & Hinton, 2009) and Image Net (ILSVRC2012; 1k classes, 1.2M training images, 50k testing images) (Russakovsky et al., 2015) for different choices of r, p, g, and compare the accuracy of ST-Res Net to related works. All models were trained from scratch... We apply our method to the character-level language model described in (Kim et al., 2016a) and evaluate it on the English Penn Treebank (PTB with word vocabulary size 10k, character vocabulary size 51, 1M training tokens, standard train-validation-test split, see (Kim et al., 2016a)) (Marcus et al., 1993).
Dataset Splits	Yes	The validation accuracy is computed from center crops. ... We apply our method to the character-level language model described in (Kim et al., 2016a) and evaluate it on the English Penn Treebank (PTB with word vocabulary size 10k, character vocabulary size 51, 1M training tokens, standard train-validation-test split, see (Kim et al., 2016a)) (Marcus et al., 1993).
Hardware Specification	No	The paper mentions support by 'AWS Cloud Credits for Research program' and discusses future work related to 'FPGAs and ASICs' as target platforms, but it does not specify the exact hardware (e.g., GPU/CPU models, memory) used for conducting the experiments described in the paper.
Software Dependencies	No	The paper mentions 'SGD' as an optimizer but does not specify any software platforms (e.g., TensorFlow, PyTorch) or libraries with their version numbers that were used to implement and run the experiments.
Experiment Setup	Yes	We generate a training set containing 100k pairs (Ai, Bi) with entries i.i.d. uniform on [ 1, 1], train the SPN with full-precision weights (initialized i.i.d. uniform on [ 1, 1]) for one epoch with SGD (learning rate 0.1, momentum 0.9, mini-batch size 4), activate quantization, and train for another epoch (with learning rate 0.001). ... We train for 250 epochs with initial learning rate 0.1 and mini-batch size 128, multiplying the learning rate by 0.1 after 150 and 200 epochs. ... We use an initial learning rate of 0.05 and mini-batch size 256, with two different learning rate schedules... All models are trained for 40 epochs using SGD with minibatch size 20 and initial learning rate 2...