Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
QKFormer: Hierarchical Spiking Transformer using Q-K Attention
Authors: chenlin zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | QKFormer achieves significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on Image Net-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on Image Net-1K. |
| Researcher Affiliation | Collaboration | 1Pengcheng Laboratory 2Harbin Institute of Technology 3Peking University |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models are available at https://github.com/zhouchenlin2096/QKFormer. |
| Open Datasets | Yes | We evaluate QKFormer on static image classification and neuromorphic classification. The former includes Image Net-1K [39], CIFAR10/100 [40]. The latter contains CIFAR10-DVS [41] and DVS128 Gesture [42]. |
| Dataset Splits | Yes | It contains 1.28 million images for training and 50k images for validation, with a total of 1,000 categories. |
| Hardware Specification | Yes | We use 8 NVIDIA Tesla V100 SXM2 32GB GPUs when training models on Image Net, while 1 GPU is used to train other datasets (CIFAR10, CIFAR100, DVS128 Gesture, CIFAR10-DVS). |
| Software Dependencies | No | All experiments are implemented based on Pytorch [53], Spiking Jelly [54] and Timm [55]. (Specific version numbers for PyTorch, Spiking Jelly, and Timm are not provided.) |
| Experiment Setup | Yes | In this experiment, we use Adam W as the optimizer, which is adopted with a base learning rate of 6 10 4. The actual learning rate was calculated as Batch Size/256 multiplied by the base learning rate. The batch size is set to 512, which is realized by accumulated gradient iterations [33] and distributed across 8 Nvidia V100 GPUs. We trained QKFormer for 200 epochs. In addition, following Dei T [32], data augmentation techniques including Rand Augment [34], random erasing [35], and stochastic depth [36] are employed in this study. The number of blocks in the three stages is set as {1, 2, 7} respectively. |