Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks

Authors: Yuchen Wang, Kexin Shi, Chengzhuo Lu, Yuguo Liu, Malu Zhang, Hong Qu

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on popular neuromorphic datasets and speech datasets, including DVS128 Gesture, CIFAR10-DVS, and Google Speech Commands, and our experimental results can outperform other state-of-the-art models.
Researcher Affiliation Academia Yuchen Wang , Kexin Shi , Chengzhuo Lu , Yuguo Liu , Malu Zhang and Hong Qu School of Computer Science and Engineering, University of Electronic Science and Technology of China yuchenwang@std.uestc.edu.cn, kexinshi@std.uestc.edu.cn, 2019270101012@std.uestc.edu.cn, liuyuguo@std.uestc.edu.cn, maluzhang@uestc.edu.cn, hongqu@uestc.edu.cn
Pseudocode No The paper describes the proposed methods using mathematical formulas and prose, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/ppppps/STSA 4 Asyn SNN.
Open Datasets Yes In order to verify the effectiveness of the proposed method, we conduct object recognition experiments on neuromorphic vision datasets DVS128 Gesture [Amir et al., 2017] and CIFAR10-DVS [Li et al., 2017], and speech recognition experiments on Google Speech Commands V1 and Google Speech Commands V2 [Warden, 2018].
Dataset Splits No For DVS128 Gesture: "the owner of the dataset divides 1,176 of them into the training set and 288 into the test set." For CIFAR10-DVS: "researchers divide the first 900 samples of each category into the training set and the remaining 100 samples into the test set. We also used this 9:1 division ratio in our experiments." For Google Speech Commands: "randomly select 1500 samples per command and split them into the training sets and test sets at a ratio of 8:2." No explicit validation split is mentioned for any of the datasets.
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud resources).
Software Dependencies No The paper mentions software components like "Adam W" optimizer and "temporal efficient training," but it does not provide specific version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup Yes The initial learning rate is set to 0.01 and we use a cosine learning rate decay schedule. We adopt the loss function of temporal efficient training [Deng et al., 2022] and the L2 penalty with a value of 1e 4 is also added. In the tokenization process, we used four 3 3 convolutional layers in the convolutional stem. A max-pooling layer with a step size of 2 is followed by each convolutional layer to divide the original image into 16 16 patches. The LIF neurons of the constructed SNNs adopted a uniform setting, their firing threshold was set to 1, and the decay coefficient τ was set to 0.5. The batch size of both training and testing is 32, and the number of epochs is set to 1000.