reproducibilityindex.ai

Delta Networks for Optimized Recurrent Network Computation

Authors: Daniel Neil, Jun Haeng Lee, Tobi Delbruck, Shih-Chii Liu

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that a naive run-time delta network implementation offers modest improvements on the number of memory accesses and computes, but optimized training techniques confer higher accuracy at higher speedup. With these optimizations, we demonstrate a 9X reduction in cost with negligible loss of accuracy for the TIDIGITS audio digit recognition benchmark. Similarly, on the large Wall Street Journal (WSJ) speech recognition benchmark, pretrained networks can also be greatly accelerated as delta networks and trained delta networks show a 5.7X improvement with negligible loss of accuracy. Finally, on an endto-end CNN-RNN network trained for steering angle prediction in a driving dataset, the RNN cost can be reduced by a substantial 100X.
Researcher Affiliation	Collaboration	1Institute of Neuroinformatics, UZH and ETH Zurich, Zurich, Switzerland 2Samsung Advanced Institute of Technology, Samsung Electronics, Suwon-Si, Republic of Korea.
Pseudocode	No	No pseudocode or algorithm blocks explicitly labeled as "Pseudocode" or "Algorithm" were found in the paper. The paper includes mathematical formulations, but not structured pseudocode.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a direct link to a code repository. It mentions using third-party libraries like Lasagne and Theano.
Open Datasets	Yes	The TIDIGITS dataset was used as an initial evaluation task to study the trajectory evolution of delta networks. Single digits (oh and zero through nine), totalling 2464 digits in the training set and 2486 digits in the test set, were transformed in the standard way (Neil & Liu, 2016). The delta network methodology was applied to an RNN trained on the larger WSJ dataset to determine whether it could produce the same gains as seen with the TIDIGITS dataset. This dataset comprised 81 hours of transcribed speech, as described in (Braun et al., 2016). The open driving dataset from comma.ai (Santana & Hotz, 2016) with 7.25 hours of driving data was used.
Dataset Splits	Yes	The TIDIGITS dataset was used as an initial evaluation task to study the trajectory evolution of delta networks. Single digits (oh and zero through nine), totalling 2464 digits in the training set and 2486 digits in the test set, were transformed in the standard way (Neil & Liu, 2016). A very large speedup exceeding 100X in the delta network GRU can be seen in Fig. 7, computed for the steering angle prediction task on 2000 consecutive frames (100s) from the validation set.
Hardware Specification	Yes	Reported training time is for a single Nvidia GTX 980 Ti GPU.
Software Dependencies	No	The networks were trained with Lasagne (Dieleman et al., 2015) powered by Theano (Bergstra et al., 2010). While specific frameworks are named, no version numbers for Lasagne or Theano are provided.
Experiment Setup	Yes	Training time is approximately 8 minutes for a 150 epoch experiment. The Q3.4 (i.e. m = 3 and f = 4) format was used for network activation values in all speech experiments... The driving dataset in Sec. 6.4 used Q2.5 activation. Training time of the CNN feature detector was about 8h for 10k updates with the batch size of 200. Training of the RNN part took about 3h for 5k updates with the batch size of 32 samples consisting of 48 frames/sample.