Spatio-Temporal Approximation: A Training-Free SNN Conversion for Transformers
Authors: Yizhou Jiang, Kunlin Hu, Tianren Zhang, Haichuan Gao, Yuqian Liu, Ying Fang, Feng Chen
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our pipeline to the Image Encoder of CLIP (Radford et al., 2021), a prevalent Language-Image model. This allows our converted model to leverage CLIP s powerful generalization abilities such as zero-shot classification. In comparison to conventional Res Net architectures, Transformers can better exploit large-scale pretraining to achieve superior performance. Furthermore, for a fair comparison with existing methods, we fine-tune the pretrained Vi T on benchmarks like CIFAR and Image Net, achieving state-of-the-art results of SNN with smaller conversion error and faster simulation. |
| Researcher Affiliation | Academia | 1Department of Automation, Tsinghua University, Beijing, China 2Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China 3College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China 4LSBDPA Beijing Key Laboratory, Beijing, China |
| Pseudocode | Yes | Algorithm 1 STA Conversion Pipeline Algorithm 2 STA Inference |
| Open Source Code | Yes | Codes are available at https://github.com/Vivia Hu/STA. |
| Open Datasets | Yes | Settings and Models. CLIP is a multi-modal ANN trained on image-text pairs with diversified Image Encoder backbones including Res Net and Vision Transformer (Vi T). It performs various tasks based on natural language prompts. Since no existing methods directly convert Transformers, we use pretrained Res Net-50 backbone for our baselines. Following standard CLIP configuration for zero-shot prediction, we evaluate on CIFAR-10/100, Image Net-200 benchmarks, and distribution-shifted CIFAR-10.1/10.2 datasets. Details in Appendix.G.1. |
| Dataset Splits | No | The paper specifies training and testing sets for datasets like CIFAR-10/100 (e.g., '50,000 training images and 10,000 testing images') but does not explicitly detail a separate validation split by providing specific percentages, counts, or a dedicated methodology for creating it. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions 'energy-efficient deployment on neuromorphic hardware' in a general sense. |
| Software Dependencies | No | The paper mentions various techniques and methods used (e.g., MMSE, signed neurons, burst spikes), but it does not specify any software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, or specific library versions). |
| Experiment Setup | Yes | Our work enables all Transformer computations in SNN to be conducted without specified conversion methodology. In practice, we combine prior techniques to complete the entire conversion, including MMSE (Li et al., 2021) to determine optimal neuron thresholds, signed neurons (Wang et al., 2022a) to handle negative weighted inputs, and burst spikes (Li & Zeng, 2022) to mitigate lagging inputs and reduce residual potentials. Implementation details are provided in Appendix.F. |