Structured Transforms for Small-Footprint Deep Learning

Authors: Vikas Sindhwani, Tara Sainath, Sanjiv Kumar

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that these transforms can significantly accelerate inference and forward/backward passes during training, and offer superior accuracy-compactness-speed tradeoffs in comparison to a number of existing techniques.
Researcher Affiliation Industry Vikas Sindhwani Tara N. Sainath Sanjiv Kumar Google, New York {sindhwani, tsainath, sanjivk}@google.com
Pseudocode Yes Theorem 3.3 (Fast Multiplication). Given an n b matrix X, the matrix-matrix product, Y = (Pr i=1 Z1(gi)Z 1(hi)) X, can be computed at the cost of 2(rb + b + r) FFTs, using the following algorithm. Set η = [1, η, η2 . . . ηn 1]T where η = ( 1) 1 n = exp(i π n) Initialize Y = 0n b Set X = fft(diag(η)X) Set G = fft(G) = [ g1 . . . gr] and H = fft(diag(η)H) = [ h1 . . . hr] for i = 1 to r U = Z 1(hi)X = diag( η)ifft diag( hi) X V = diag( gi) fft(U) Y = Y + V Set Y = ifft (Y) Return Y
Open Source Code No The paper cites supplementary material at http://vikas.sindhwani.org/st_supplementary.pdf, which is a PDF, and references a third-party library FFTW, but does not provide an explicit link or statement about releasing its own source code.
Open Datasets Yes MNIST is the original 10-class MNIST digit classification dataset with 60000 training examples and 10000 test examples. We refer the reader to [23] for more details about the datasets. (Reference [23] is: T. Sainath and C. Parada. Convolutional neural networks for small-footprint keyword spotting. In Proc. Interspeech, 2015.)
Dataset Splits Yes The utterances were randomly split into training, development and evaluation sets in the ratio of 80 : 5 : 15.
Hardware Specification Yes 6-core 32-GB Intel(R) Xeon(R) machine; random datasets.
Software Dependencies No The paper mentions "FFT implementations (we use FFTW: http://www.fftw.org)" but does not specify a version number for FFTW or any other software components.
Experiment Setup Yes The global learning rate is set to 0.002, while our structured transform layers use a layer-specific learning rate of 0.0005; both are decayed by an exponential factor of 0.1.