QKFormer: Hierarchical Spiking Transformer using Q-K Attention
Authors: chenlin zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | QKFormer achieves significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on Image Net-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on Image Net-1K. |
| Researcher Affiliation | Collaboration | 1Pengcheng Laboratory 2Harbin Institute of Technology 3Peking University |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models are available at https://github.com/zhouchenlin2096/QKFormer. |
| Open Datasets | Yes | We evaluate QKFormer on static image classification and neuromorphic classification. The former includes Image Net-1K [39], CIFAR10/100 [40]. The latter contains CIFAR10-DVS [41] and DVS128 Gesture [42]. |
| Dataset Splits | Yes | It contains 1.28 million images for training and 50k images for validation, with a total of 1,000 categories. |
| Hardware Specification | Yes | We use 8 NVIDIA Tesla V100 SXM2 32GB GPUs when training models on Image Net, while 1 GPU is used to train other datasets (CIFAR10, CIFAR100, DVS128 Gesture, CIFAR10-DVS). |
| Software Dependencies | No | All experiments are implemented based on Pytorch [53], Spiking Jelly [54] and Timm [55]. (Specific version numbers for PyTorch, Spiking Jelly, and Timm are not provided.) |
| Experiment Setup | Yes | In this experiment, we use Adam W as the optimizer, which is adopted with a base learning rate of 6 10 4. The actual learning rate was calculated as Batch Size/256 multiplied by the base learning rate. The batch size is set to 512, which is realized by accumulated gradient iterations [33] and distributed across 8 Nvidia V100 GPUs. We trained QKFormer for 200 epochs. In addition, following Dei T [32], data augmentation techniques including Rand Augment [34], random erasing [35], and stochastic depth [36] are employed in this study. The number of blocks in the three stages is set as {1, 2, 7} respectively. |