Spikformer: When Spiking Neural Network Meets Transformer
Authors: Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng YAN, Yonghong Tian, Li Yuan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed architecture outperforms the state-of-the-art SNNs on both static and neuromorphic datasets. We conduct experiments on both static datasets CIFAR, Image Net (Deng et al., 2009), and neuromorphic datasets CIFAR10-DVS, DVS128 Gesture (Amir et al., 2017) to evaluate the performance of Spikformer. We conduct ablation studies to show the effects of the SSA module and Spikformer in Sec. 4.3. |
| Researcher Affiliation | Collaboration | 1Peking University 2Peng Cheng Laboratory 3Sea AI Lab 4 Shenzhen EEGSmart Technology Co., Ltd. {yuanli-ece}@pku.edu.cn |
| Pseudocode | No | The paper includes equations and descriptions of processes but does not present any formal pseudocode blocks or sections labeled as "Algorithm". |
| Open Source Code | No | Our codes of Spikformer models are uploaded as supplementary material and will be available on Git Hub after review. |
| Open Datasets | Yes | We conduct experiments on both static datasets CIFAR, Image Net (Deng et al., 2009), and neuromorphic datasets CIFAR10-DVS, DVS128 Gesture (Amir et al., 2017) to evaluate the performance of Spikformer. |
| Dataset Splits | Yes | Image Net contains around 1.3 million 1, 000-class images for training and 50, 000 images for validation. CIFAR provides 50, 000 train and 10, 000 test images. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., specific GPU or CPU models) used for running the experiments. |
| Software Dependencies | No | The models for conducting experiments are implemented based on Pytorch (Paszke et al., 2019), Spiking Jelly 2 and Pytorch image models library (Timm) 3. [2] https://github.com/fangwei123456/spikingjelly [3] https://github.com/rwightman/pytorch-image-models. No specific version numbers for PyTorch, Spiking Jelly, or Timm are provided. |
| Experiment Setup | Yes | The optimizer is Adam W and the batch size is set to 128 or 256 during 310 training epochs with a cosine-decay learning rate whose initial value is 0.0005. The scaling factor is 0.125 when training on Image Net and CIFAR. The time-step of the spiking neuron is 10 or 16. The training epoch is 200 for DVS128 Gesture and 106 for CIFAR10-DVS. The optimizer is Adam W and the batch size is set to 16. The learning rate is initialized to 0.1 and reduced with cosine decay. |