One-step Spiking Transformer with a Linear Complexity

Authors: Xiaotian Song, Andy Song, Rong Xiao, Yanan Sun

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both static and neuromorphic images show that OST can perform as well as or better than SOTA methods with just one timestep, even for more difficult tasks.
Researcher Affiliation Academia 1College of Computer Science, Sichuan University 2School of Computing Technologies, RMIT University songxt@stu.scu.edu.cn, andy.song@rmit.edu.au, xiaorong.scu@gmail.com, ysun@scu.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The supplementary materials and source code are available at https://github.com/songxt3/OST.
Open Datasets Yes OST is validated on both static image classification, involving Image Net, CIFAR10, and CIFAR100, and neuromorphic image classification, using CIFAR10-DVS [Li et al., 2017] and DVS128 Gesture [Amir et al., 2017].
Dataset Splits No The paper mentions using datasets like ImageNet, CIFAR10, CIFAR100, CIFAR10-DVS, and DVS128 Gesture. It describes training parameters, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact sample counts) in the main text.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used to run the experiments.
Software Dependencies No The paper mentions using 'Spiking Jelly' but does not specify its version number or any other software dependencies with their specific versions.
Experiment Setup Yes Image Net. Following Spikformer, OST training utilizes 224 224 images from Image Net and Adam [Kingma and Ba, 2014]. The learning rate is initially set to 6e 5 and progressively reduced using a cosine decay. The batch size and the epochs are 16 and 310 respectively. [...] CIFAR10-DVS. ... The input image size is 128 128, with batch size 16. The learning rate is 1e 3 initially and reduces using a cosine decay. The initial timesteps here is not 4 as in Section 4.1, but 16 due to the increased task difficulty in classifying neuromorphic images. The training epoch is set to 106. The number of transformer encoder blocks N is set to 2, while the embedding dimension D is 256.