Spikformer: When Spiking Neural Network Meets Transformer

Authors: Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng YAN, Yonghong Tian, Li Yuan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed architecture outperforms the state-of-the-art SNNs on both static and neuromorphic datasets. We conduct experiments on both static datasets CIFAR, Image Net (Deng et al., 2009), and neuromorphic datasets CIFAR10-DVS, DVS128 Gesture (Amir et al., 2017) to evaluate the performance of Spikformer. We conduct ablation studies to show the effects of the SSA module and Spikformer in Sec. 4.3.
Researcher Affiliation Collaboration 1Peking University 2Peng Cheng Laboratory 3Sea AI Lab 4 Shenzhen EEGSmart Technology Co., Ltd. {yuanli-ece}@pku.edu.cn
Pseudocode No The paper includes equations and descriptions of processes but does not present any formal pseudocode blocks or sections labeled as "Algorithm".
Open Source Code No Our codes of Spikformer models are uploaded as supplementary material and will be available on Git Hub after review.
Open Datasets Yes We conduct experiments on both static datasets CIFAR, Image Net (Deng et al., 2009), and neuromorphic datasets CIFAR10-DVS, DVS128 Gesture (Amir et al., 2017) to evaluate the performance of Spikformer.
Dataset Splits Yes Image Net contains around 1.3 million 1, 000-class images for training and 50, 000 images for validation. CIFAR provides 50, 000 train and 10, 000 test images.
Hardware Specification No The paper does not specify the hardware (e.g., specific GPU or CPU models) used for running the experiments.
Software Dependencies No The models for conducting experiments are implemented based on Pytorch (Paszke et al., 2019), Spiking Jelly 2 and Pytorch image models library (Timm) 3. [2] https://github.com/fangwei123456/spikingjelly [3] https://github.com/rwightman/pytorch-image-models. No specific version numbers for PyTorch, Spiking Jelly, or Timm are provided.
Experiment Setup Yes The optimizer is Adam W and the batch size is set to 128 or 256 during 310 training epochs with a cosine-decay learning rate whose initial value is 0.0005. The scaling factor is 0.125 when training on Image Net and CIFAR. The time-step of the spiking neuron is 10 or 16. The training epoch is 200 for DVS128 Gesture and 106 for CIFAR10-DVS. The optimizer is Adam W and the batch size is set to 16. The learning rate is initialized to 0.1 and reduced with cosine decay.