Efficient recurrent architectures through activity sparsity and sparse back-propagation through time

Authors: Anand Subramoney, Khaleelulla Khan Nazeer, Mark Schöne, Christian Mayr, David Kappel

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our model on gesture prediction, which is a popular real-world benchmark for RNNs... We evaluated the EGRU on the sequential MNIST and permuted sequential MNIST tasks... We evaluated our model on language modeling tasks based on the Penn Treebank... and the Wiki Text-2 dataset... Table 1: Model comparison for the DVS Gesture recognition task. Table 2: Model comparison on sequential MNIST... Table 3: Model comparison on Penn Treebank and Wiki Text-2.
Researcher Affiliation Academia Anand Subramoney1,2, , Khaleelulla Khan Nazeer3, Mark Schöne3, Christian Mayr3,4, David Kappel1 1 Institute for Neural Computation, Ruhr University Bochum, Germany 2 Royal Holloway, University of London 3 Faculty of Electrical and Computer Engineering, Technische Universität Dresden, Dresden, Germany 4 Centre for Tactile Internet with Human-in-the-Loop (Ce TI),Technische Universität Dresden, Dresden, Germany
Pseudocode No The paper contains “Fig. 1: Illustration of EGRU.” but does not contain any sections or figures explicitly labeled “Pseudocode” or “Algorithm”. The description of the method is in prose and mathematical equations.
Open Source Code Yes Code is available at https://github.com/Khaleel Khan/Ev NN/.
Open Datasets Yes The DVS128 Gesture Dataset (Amir et al., 2017), provides sparse event-based inputs... We evaluated the EGRU on the sequential MNIST and permuted sequential MNIST tasks (Le et al., 2015)... We evaluated our model on language modeling tasks based on the Penn Treebank (Marcus et al., 1993) dataset and the Wiki Text-2 dataset (Merity et al., 2017).
Dataset Splits No The paper mentions training and testing on datasets (“We trained a 1-layer EGRU”, “test activity”, “test scores”) and discusses data augmentation, but it does not provide explicit numerical details (percentages or sample counts) for train/validation/test splits, nor does it refer to specific predefined splits by name or citation that define these ratios.
Hardware Specification No The paper states: “...this formulation will also allow the model to run efficiently on CPU-based nodes when implemented using appropriate software paradigms. Moreover, an implementation on novel neuromorphic hardware like Davies et al. (2018); Höppner et al. (2017), that is geared towards event-based computation, can make the model orders of magnitude more energy efficient (Ostrau et al., 2022).” It does not specify the hardware used for the reported experimental results.
Software Dependencies No We demonstrate the task performance and activity sparsity of the model implemented in Py Torch. The paper refers to “publicly available libraries (Appendix section G)”, but this appendix is not provided, and the main text does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We trained a 1-layer EGRU with 590 units (matching the number of parameters with a 512 unit LSTM). our models consisted of three stacked EGRU cells without skip connections. Drop Connect (Wan et al., 2013) was applied to the hidden-to-hidden weights. The weights of the final softmax layer were tied to the embedding layer (Inan et al., 2017; Press and Wolf, 2017). Further experimental details, ablation studies and statistics over different runs can be found in the supplement sections E.1, E.1.1 and tables S1, S4 respectively.