Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Spiking Neural Networks Need High-Frequency Information
Authors: Yuetong Fang, Deming Zhou, Ziqing Wang, Hongwei Ren, zeng zecui, Lusong Li, shibo zhou, Renjing Xu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, on Spiking Transformers, adopting Avg-Pooling (low-pass) for token mixing lowers performance to 76.73% on Cifar-100, whereas replacing it with Max-Pool (high-pass) pushes the top-1 accuracy to 79.12%. Accordingly, we introduce Max-Former that restores high-frequency signals through two frequencyenhancing operators... Notably, Max-Former attains 82.39% top1 accuracy on Image Net using only 63.99M parameters, surpassing Spikformer (74.81%, 66.34M) by +7.58%... We provide the first theoretical proof that spiking neurons inherently act as low-pass filters at the network level, revealing their tendency to suppress high-frequency features. We propose Max-Former, which restores high-frequency information in Spiking Transformers via two lightweight modules: extra Max-Pool in patch embedding and Depth-Wise Convolution (DWC) in place of early-stage self-attention. Restoring high-frequency information significantly improves performance while saving energy cost. On Image Net, Max-Former achieves 82.39% top-1 accuracy (+7.58% over Spikformer) with 30% energy consumption and lower parameter count (63.99M vs. 66.34M). Extending the insight beyond transformers, Max-Res Net-18 achieves state-of-the-art performance on convolution-based benchmarks: 97.17% on CIFAR-10 and 83.06% on CIFAR-100. |
| Researcher Affiliation | Collaboration | 1The Hong Kong University of Science and Technology (Guangzhou) 2Brain Mind Innovation INC 3JD Explore Academy 4Northwestern University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of algorithms, such as the LIF model and spiking DWC, but does not present any explicitly labeled pseudocode or algorithm blocks in a structured format. |
| Open Source Code | Yes | Code is available: https://github.com/bic-L/Max Former. Code implementation is available at https://github.com/bic-L/Max Former. |
| Open Datasets | Yes | We evaluate Max-Former through comprehensive experiments on static datasets (CIFAR-10 [36], CIFAR-100 [37] and Image Net [38]) and neuromorphic datasets (CIFAR10-DVS [39], DVS128 Gesture [40]). |
| Dataset Splits | Yes | Image Net-1k [38] is one of the most widely used datasets in computer vision. It contains 1.28 million training, 50,000 validation, and 100,000 test images covering the common 1K classes. Both CIFAR-10 [36] and CIFAR-100 [37] include 50,000 training images and 10,000 testing images with 32 32 resolution. The main difference between them is that CIFAR-10 has 10 categories for classification, while CIFAR-100 has 100 categories. Neuromorphic Datasets: For event-based vision tasks, we evaluate on two standard benchmarks. CIFAR10-DVS [39] is an event-based version of the CIFAR-10 dataset, created by capturing moving image samples using the Dynamic Vision Sensor (DVS). It includes 10,000 event-based images (128 128 pixels) spread across 10 classes, with 9,000 samples for training and 1,000 for testing. The DVS128 Gesture dataset [40] contains 1,342 event-based recordings of 11 different hand gesture types performed by 29 people under 3 different lighting conditions. Each gesture recording lasts about 6 seconds on average. |
| Hardware Specification | Yes | For our Image Net experiments, we used 8 NVIDIA A30 GPUs to train most models. However, for the Max Former-10-512 (T=4) and Max Former-10-768 (T=4) models, we used 8 NVIDIA H20 GPUs instead. For the smaller datasets (CIFAR10, CIFAR100, DVS128 Gesture, and CIFAR10-DVS), we used a single A30 GPU for training. All tests were conducted on a Cent OS 7.9 server equipped with the Intel Xeon Gold 6348 CPU (2.60GHz) and the Nvidia A30 GPU. |
| Software Dependencies | No | The paper states: "The training and inference pipeline are implemented in Spiking Jelly [48]." While a software library is mentioned, a specific version number for Spiking Jelly or any other dependencies like Python or PyTorch is not provided. |
| Experiment Setup | Yes | Our training scheme mainly follows [24] and [41]. Specifically, Mix Up [49], Cut Mix [50] and Rand Augment [51] are used for data augmentation. The models are trained using Adam W optimizer [52] with the weight decay of 0.05 for Image Net-1K classification tasks and the weight decay of 0.06 for all other datasets. Label Smoothing [53] is set as 0.1. Detailed training hyperparameters are shown in Table 6. Table 6 includes: Model Size, Epochs, Resolution, Batch Size, Optimizer, Learning rate, Learning rate decay, Warmup epochs, Weight decay, Rand Augment, Mixup, Cut Mix, Mixup prob, Erasing prob, Label smoothing. |