Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bipolar Self-attention for Spiking Transformers

Authors: Shuai Wang, Malu Zhang, Jingya Wang, Dehao Zhang, Yimeng Shan, Jieyuan (Eric) Zhang, Yichen Xiao, Honglin Cao, Haonan Zhang, Zeyu Ma, Yang Yang, Haizhou Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that BSA achieves substantial performance improvements across various tasks, including image classification, semantic segmentation, and event-based tracking. These results establish its potential as a fundamental building block for energy-efficient Spiking Transformers. 5 Experiments In image classification Task, we evaluate the proposed BSA module on Image Net-1K [7] using three representative state-of-the-art (SOTA) Spiking Transformer architectures: Spikingformer [67], QKformer [68], and Spike-driven Transformer-V3 [52]. Additionally, we perform comprehensive comparative analyses against recent Spiking Transformers [70, 49, 30, 51]. As shown in Table 1, our BSA module consistently improves performance across all three Spiking Transformer architectures.
Researcher Affiliation Academia Shuai Wang1, Malu Zhang1,2 , Jingya Wang1, Dehao Zhang1, Yimeng Shan1, Jieyuan Zhang1, Yichen Xiao1, Honglin Cao1, Haonan Zhang1, Zeyu Ma1, Yang Yang1, Haizhou Li2,3 1University of Electronic Science and Technology of China 2Shenzhen Loop Area Institute, 3The Chinese University of Hong Kong (Shenzhen) Corresponding author: EMAIL
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, for example, in Section 3 "Preliminary" and 4.2 "Bipolar Self-attention for Spiking Transformers". However, it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Yes, we have submitted the code for the experiments and will upload it to Git Hub once the paper is accepted.
Open Datasets Yes Extensive experiments demonstrate that BSA achieves significant performance improvements across various advanced Spiking Transformers on Image Net-1K. 5.1 Image Classification In image classification Task, we evaluate the proposed BSA module on Image Net-1K [7] 5.2 Semantic Segmentation and Event-based Tracking Tasks To further validate the efficacy of the proposed BSA, we evaluate its performance on more regression tasks, such as Semantic Segmentation and Event-based Tracking tasks. For semantic segmentation, we employ the challenging ADE20K dataset [65]... Furthermore, we examine BSA s efficacy in event-based tracking... experiments across the FE108 [58], FELT [40], and Vis Event [41] datasets... 5.3 Ablation Study To validate the efficacy of BSA components, we conduct ablation studies on the CIFAR100 dataset [19]
Dataset Splits Yes For semantic segmentation, we employ the challenging ADE20K dataset [65], which comprises 20K and 2K images in the training and validation sets, respectively, covering 150 semantic categories. Dataset details include: FE108 [58]: 108 event sequences (3,000 5,000 frames each), annotated bounding boxes, diverse scenes, optimized for high-speed tracking; Vis Event [40]: 60 RGB-event synchronized sequences ( 250,000 frames total), varied indoor/outdoor conditions, cross-modal robustness benchmark; FELT [41]: 200 sequences, event accumulation in ultra-short windows (1 5 ms), challenging real-time scenarios (motion blur, occlusion, rapid motion). The training procedure includes 100 epochs for FE108 and Vis Event, and 300 epochs for FELT. Random sampling per epoch selects 60k image pairs (maximum interval of 200 frames) for FE108 and FELT, and 30k pairs for Vis Event, ensuring sample diversity.
Hardware Specification Yes All experiments are conducted on Image Net-1K dataset using Py Torch framework. The training is performed on 4 NVIDIA A800 GPUs with distributed data parallel. Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: Yes, we have reported the model specifications of the computational resources used for the experiments and provided some experimental details in the appendix.
Software Dependencies No All experiments are conducted on Image Net-1K dataset using Py Torch framework. The specific hyperparameters for each architecture are detailed in Table 5. Optimizer Adam W Adam W LAMB Base Learning rate 7e-6 6e-4 6e-4 While software like PyTorch and optimizers (AdamW, LAMB) are mentioned, no specific version numbers for these or other libraries/languages are provided.
Experiment Setup Yes Table 5: Comparison of Hyperparameters for Different Model Architectures Hyper-parameter Spikingformer QKformer Spike-driven V3 Timestep 4 4 4 Epochs 100 200 200 Resolution 224 224 224 Batch size 64 100 600 Optimizer Adam W Adam W LAMB Base Learning rate 7e-6 6e-4 6e-4 Learning rate decay Cosine Layer-wise 1.0 Layer-wise 1.0 Warmup epochs 5 5 10 Weight decay 5e-2 5e-2 0.05 Rand Augment rand-m9-mstd0.5-inc1 rand-m9-mstd0.5-inc1 rand-m9-mstd0.5-inc1 Mixup 0.8 0 0 Cutmix 1.0 0 0 Label smoothing 0.1 0.1 0.1