Delta Networks for Optimized Recurrent Network Computation
Authors: Daniel Neil, Jun Haeng Lee, Tobi Delbruck, Shih-Chii Liu
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that a naive run-time delta network implementation offers modest improvements on the number of memory accesses and computes, but optimized training techniques confer higher accuracy at higher speedup. With these optimizations, we demonstrate a 9X reduction in cost with negligible loss of accuracy for the TIDIGITS audio digit recognition benchmark. Similarly, on the large Wall Street Journal (WSJ) speech recognition benchmark, pretrained networks can also be greatly accelerated as delta networks and trained delta networks show a 5.7X improvement with negligible loss of accuracy. Finally, on an endto-end CNN-RNN network trained for steering angle prediction in a driving dataset, the RNN cost can be reduced by a substantial 100X. |
| Researcher Affiliation | Collaboration | 1Institute of Neuroinformatics, UZH and ETH Zurich, Zurich, Switzerland 2Samsung Advanced Institute of Technology, Samsung Electronics, Suwon-Si, Republic of Korea. |
| Pseudocode | No | No pseudocode or algorithm blocks explicitly labeled as "Pseudocode" or "Algorithm" were found in the paper. The paper includes mathematical formulations, but not structured pseudocode. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a direct link to a code repository. It mentions using third-party libraries like Lasagne and Theano. |
| Open Datasets | Yes | The TIDIGITS dataset was used as an initial evaluation task to study the trajectory evolution of delta networks. Single digits (oh and zero through nine), totalling 2464 digits in the training set and 2486 digits in the test set, were transformed in the standard way (Neil & Liu, 2016). The delta network methodology was applied to an RNN trained on the larger WSJ dataset to determine whether it could produce the same gains as seen with the TIDIGITS dataset. This dataset comprised 81 hours of transcribed speech, as described in (Braun et al., 2016). The open driving dataset from comma.ai (Santana & Hotz, 2016) with 7.25 hours of driving data was used. |
| Dataset Splits | Yes | The TIDIGITS dataset was used as an initial evaluation task to study the trajectory evolution of delta networks. Single digits (oh and zero through nine), totalling 2464 digits in the training set and 2486 digits in the test set, were transformed in the standard way (Neil & Liu, 2016). A very large speedup exceeding 100X in the delta network GRU can be seen in Fig. 7, computed for the steering angle prediction task on 2000 consecutive frames (100s) from the validation set. |
| Hardware Specification | Yes | Reported training time is for a single Nvidia GTX 980 Ti GPU. |
| Software Dependencies | No | The networks were trained with Lasagne (Dieleman et al., 2015) powered by Theano (Bergstra et al., 2010). While specific frameworks are named, no version numbers for Lasagne or Theano are provided. |
| Experiment Setup | Yes | Training time is approximately 8 minutes for a 150 epoch experiment. The Q3.4 (i.e. m = 3 and f = 4) format was used for network activation values in all speech experiments... The driving dataset in Sec. 6.4 used Q2.5 activation. Training time of the CNN feature detector was about 8h for 10k updates with the batch size of 200. Training of the RNN part took about 3h for 5k updates with the batch size of 32 samples consisting of 48 frames/sample. |