Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Toward Relative Positional Encoding in Spiking Transformers

Authors: Changze Lv, Yansen Wang, Dongqi Han, Yifei Shen, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our RPE methods on various tasks, including time series forecasting, text classification, and patch-based image classification, and the experimental results demonstrate a satisfying performance gain by incorporating our RPE methods across many architectures.
Researcher Affiliation	Collaboration	1College of Computer Science and Artificial Intelligence, Fudan University 2Microsoft Research Asia
Pseudocode	No	The paper includes figures (Figure 1 and Figure 2) that illustrate concepts and mechanisms through diagrams and equations, but it does not contain explicit pseudocode blocks or algorithms formatted with structured steps labeled as "Pseudocode" or "Algorithm" for its proposed methods.
Open Source Code	Yes	Our code is available at https://github.com/microsoft/Seq SNN.
Open Datasets	Yes	To evaluate the RPE capabilities of the compared models, we conduct experiments on two sequential tasks: time-series forecasting and text classification. Following [6], we choose 4 real-world datasets for time-series forecasting: Metr-la [27], Pems-bay [27], Electricity [28], Solar [28]. For text classification, we follow [7] and conduct experiments on six benchmark datasets: Movie Reviews [29], SST-2 [30], SST-5, Subj, Chn Senti, and Waimai. Additionally, to demonstrate the versatility of our RPE method in image processing, we perform patch-based image classification experiments on two static datasets, CIFAR and Tiny-Image Net, and one neuromorphic dataset, CIFAR10-DVS [2].
Dataset Splits	Yes	Table S3: The statistics of time-series datasets. Dataset Samples Variables Observation Length Train-Valid-Test Ratio Metr-la 34, 272 207 12, (short-term) (0.7, 0.2, 0.1) Pems-bay 52, 116 325 12, (short-term) (0.7, 0.2, 0.1) Solar-energy 52, 560 137 168, (long-term) (0.6, 0.2, 0.2) Electricity 26, 304 321 168, (long-term) (0.6, 0.2, 0.2) AGNEWS [32] is a large-scale text classification benchmark derived from AG s corpus of news articles, containing 120, 000 training samples and 7, 600 valid samples evenly distributed across four categories World, Sports, Business, and Science/Technology. IMDB [33] is a benchmark for binary sentiment classification, containing 50, 000 movie reviews labeled as positive or negative, split evenly into training and test sets to evaluate natural language understanding and opinion mining models.
Hardware Specification	Yes	We conducted time-series forecasting experiments on 24G-V100 GPUs. We conducted text classification experiments on 4 RTX-3090 GPUs we choose to use one 80G-A100 GPU
Software Dependencies	No	The paper mentions specific optimizers like "Adam [41]" and "Adam W [43]", and the use of "BERT-Tokenizer in Huggingface". While these indicate libraries and tools, specific version numbers for Python, PyTorch, Huggingface, or other key software dependencies are not provided in the text.
Experiment Setup	Yes	We set the training batch size as 32 and adopt Adam [41] optimizer with a cosine scheduler of learning rate 1e-4. An early stopping strategy with a tolerance of 30 epochs is adopted. For other configurations, we honestly follow the Seq SNN framework proposed by [6]. All Spikformers are with 12 encoder blocks and 768 feature embedding dimension. We directly trained Spikformers with arctangent surrogate gradients on all datasets. We use the BERT-Tokenizer in Huggingface to tokenize the sentences to token sequences. We pad all samples to the same sequence length of 256. We conducted text classification experiments on 4 RTX-3090 GPUs, and set the batch size as 32, optimizer as Adam W [43] with weight decay of 5e-3, and set a cosine scheduler of starting learning rate of 5e-4. What s more, in order to speed up the training stage, we adopt the automatic mixed precision training strategy.