Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training
Authors: Xi Chen, Chang Gao, Zuowen Wang, Longbiao Cheng, Sheng Zhou, Shih-Chii Liu, Tobi Delbruck
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results show a reduction of 80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. |
| Researcher Affiliation | Academia | 1Sensors Group, Institute of Neuroinformatics, University of Zurich and ETH Zurich 2Department of Microelectronics, Delft University of Technology |
| Pseudocode | No | The paper provides mathematical formulations and conceptual diagrams but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'For software experiments we implement Delta RNNs in Pytorch using custom functions for forward and backward propagation.' but does not provide any explicit statement or link for open-source code. |
| Open Datasets | Yes | We use the FSCD (Lugosch et al. 2019) to verify the mathematical correctness of the sparse version of BPTT, and to evaluate the accuracy and cost of the Delta RNNs on Spoken Language Understanding (SLU) tasks. ... we use GSCD v2 (Warden 2018), a dataset frequently used for benchmarking ASIC and FPGA keyword spotting implementations (Shan et al. 2020; Giraldo, Jain, and Verhelst 2021). |
| Dataset Splits | No | The paper mentions a 'train/test sets with the ratio 8:2' for the GSCD, but does not explicitly provide details about a validation set or its split for either dataset. |
| Hardware Specification | Yes | For software experiments we implement Delta RNNs in Pytorch using custom functions for forward and backward propagation. Software experiments are conducted on a GTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions implementing Delta RNNs in Pytorch but does not provide specific version numbers for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | The model is trained for 80 epochs with learning rate 1e-3 and batch size 32. We use cosine annealing scheduler, ADAM optimizer, and weight decay coefficient of 1e-2. |