reproducibilityindex.ai

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Authors: Xi Chen, Chang Gao, Zuowen Wang, Longbiao Cheng, Sheng Zhou, Shih-Chii Liu, Tobi Delbruck

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results show a reduction of 80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%.
Researcher Affiliation	Academia	1Sensors Group, Institute of Neuroinformatics, University of Zurich and ETH Zurich 2Department of Microelectronics, Delft University of Technology
Pseudocode	No	The paper provides mathematical formulations and conceptual diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'For software experiments we implement Delta RNNs in Pytorch using custom functions for forward and backward propagation.' but does not provide any explicit statement or link for open-source code.
Open Datasets	Yes	We use the FSCD (Lugosch et al. 2019) to verify the mathematical correctness of the sparse version of BPTT, and to evaluate the accuracy and cost of the Delta RNNs on Spoken Language Understanding (SLU) tasks. ... we use GSCD v2 (Warden 2018), a dataset frequently used for benchmarking ASIC and FPGA keyword spotting implementations (Shan et al. 2020; Giraldo, Jain, and Verhelst 2021).
Dataset Splits	No	The paper mentions a 'train/test sets with the ratio 8:2' for the GSCD, but does not explicitly provide details about a validation set or its split for either dataset.
Hardware Specification	Yes	For software experiments we implement Delta RNNs in Pytorch using custom functions for forward and backward propagation. Software experiments are conducted on a GTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions implementing Delta RNNs in Pytorch but does not provide specific version numbers for Pytorch or any other software dependencies.
Experiment Setup	Yes	The model is trained for 80 epochs with learning rate 1e-3 and batch size 32. We use cosine annealing scheduler, ADAM optimizer, and weight decay coefﬁcient of 1e-2.