One-step Spiking Transformer with a Linear Complexity
Authors: Xiaotian Song, Andy Song, Rong Xiao, Yanan Sun
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both static and neuromorphic images show that OST can perform as well as or better than SOTA methods with just one timestep, even for more difficult tasks. |
| Researcher Affiliation | Academia | 1College of Computer Science, Sichuan University 2School of Computing Technologies, RMIT University songxt@stu.scu.edu.cn, andy.song@rmit.edu.au, xiaorong.scu@gmail.com, ysun@scu.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The supplementary materials and source code are available at https://github.com/songxt3/OST. |
| Open Datasets | Yes | OST is validated on both static image classification, involving Image Net, CIFAR10, and CIFAR100, and neuromorphic image classification, using CIFAR10-DVS [Li et al., 2017] and DVS128 Gesture [Amir et al., 2017]. |
| Dataset Splits | No | The paper mentions using datasets like ImageNet, CIFAR10, CIFAR100, CIFAR10-DVS, and DVS128 Gesture. It describes training parameters, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact sample counts) in the main text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'Spiking Jelly' but does not specify its version number or any other software dependencies with their specific versions. |
| Experiment Setup | Yes | Image Net. Following Spikformer, OST training utilizes 224 224 images from Image Net and Adam [Kingma and Ba, 2014]. The learning rate is initially set to 6e 5 and progressively reduced using a cosine decay. The batch size and the epochs are 16 and 310 respectively. [...] CIFAR10-DVS. ... The input image size is 128 128, with batch size 16. The learning rate is 1e 3 initially and reduces using a cosine decay. The initial timesteps here is not 4 as in Section 4.1, but 16 due to the increased task difficulty in classifying neuromorphic images. The training epoch is set to 106. The number of transformer encoder blocks N is set to 2, while the embedding dimension D is 256. |