Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Spiking Vision Transformer with Saccadic Attention
Authors: Shuai Wang, Malu Zhang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Yu Liang, Yimeng Shan, Qian Sun, Enqi Zhang, Yang Yang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across various visual tasks demonstrate that SNN-Vi T achieves state-of-the-art performance with linear computational complexity. |
| Researcher Affiliation | Academia | 1University of Electronic Science and Technology of China 2Northumbria University, 3Liaoning Technical University |
| Pseudocode | No | The paper describes methods and models using mathematical equations and textual explanations, but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | SNN-Vi T is evaluated on both static and neuromorphic datasets, including CIFAR10, CIFAR100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009) and CIFAR10-DVS (Li et al., 2017). Specifically, for Image Net, the input image size is 3 224 224, with batch sizes of 128, and training epoch is conducted over 310. Our experimental results are summarized in Table.1 and 2. We select two remote sensing datasets: NWPU VHR-10 Cheng et al. (2017) and SSDD (Wang et al., 2019). |
| Dataset Splits | No | The paper mentions datasets like CIFAR10, CIFAR100, Image Net, CIFAR10-DVS, NWPU VHR-10, and SSDD, but it does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or clear references to predefined splits for each dataset) within the main text. |
| Hardware Specification | Yes | Experiments are carried out on a high-performance computing platform equipped with an NVIDIA RTX4090 GPU, using Stochastic Gradient Descent (SGD) as the optimization algorithm. |
| Software Dependencies | No | The paper mentions 'Stochastic Gradient Descent (SGD) as the optimization algorithm' and 'YOLO-v3 architecture' but does not provide specific software dependency names with version numbers for libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | Specifically, for Image Net, the input image size is 3 224 224, with batch sizes of 128, and training epoch is conducted over 310. The initial learning rate is set at 1 10 2, adjusted according to a polynomial decay strategy. The entire training process spans 300 epochs on the NWPU-VHR-10 and SSDD datasets, ensuring comprehensive learning and adaptation to data characteristics. In this configuration, the expansion ratio of the Multi-Layer Perceptron (MLP) is fixed at 4 to achieve an optimal balance between computational efficiency and model performance. |