Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks
Authors: Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, Shi Gu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over several popular network structures show that training SNN with Dspike consistently outperforms the state-of-the-art training methods. For example, on the CIFAR10-DVS classification task, we can train a spiking Res Net-18 and achieve 75.4% top-1 accuracy with 10 time steps. |
| Researcher Affiliation | Academia | Yuhang Li1 , Yufei Guo2 , Shanghang Zhang3, Shikuang Deng1, Yongqing Hai2, Shi Gu1 1University of Electronic Science and Technology of China, 2Peking University, 3University of California Berkeley |
| Pseudocode | Yes | Algorithm 1: Training SNN with Dspike gradient Input: SNN to be trained; Training dataset, total training epoch E, total training iteration in one epoch I, FDG step size ε, temperature update step size b, initialized temperature b0. for all e = 1, 2, . . . , E-th epoch do Collect input data and labels, computing the FDG ( ˆ ε,w L) of the first layer using ε; for all b in {be 1, be 1 + b, be 1 b} do Compute Dsipke surrogate gradient b,w L and cos_sim( ˆ ε,w L, b,w L) Find the optimal temperature with the highest cosine similarity and update it to be; for all i = 1, 2, . . . , I-iteration do Get training data and labels, compute Dspike surrogate gradient using optimal b; Descend loss function and update weights; return trained SNN. |
| Open Source Code | Yes | Our code will be available in the supplemental. |
| Open Datasets | Yes | We use CIFAR10/100 [34], Image Net [35], and CIFAR10-DVS [45]. |
| Dataset Splits | No | The paper specifies training and test set sizes (e.g., '50K training and 10K test images' for CIFAR, '9k training images and 1k test images' for CIFAR10-DVS, '1250k training images and 50k test images' for ImageNet), but it does not explicitly mention or quantify a separate validation split for these datasets. |
| Hardware Specification | Yes | We run the model with 4 GTX 1080Tis. |
| Software Dependencies | No | The paper mentions general ML frameworks (Pytorch [5], JAX [6], and Tensorflow [7]) in the introduction, but it does not provide specific software dependencies with version numbers for its own implementation or experiments. |
| Experiment Setup | Yes | We use Auto Augment [37] and Cutout [38] for data augmentation. We use Res Net-18 architecture for running experiments. We train the corresponding ANN first and use it to initialize the first SNN with (T = 6), then we use Time Inheritance Training to gradually reduce the time step to 2. For FDG computation, we use the best practice ε = 0.1 in the toy experiments. The temperature is initialized to 1, and we set the update step size to b = 0.2. We adopt SGD optimizer with 0.9 momentum. In the first round of TIT, we train the model for 300 epochs with a learning rate of 0.01 cosine decayed [39] to 0. In the rest rounds of TIT, we only train the model for 50 epochs with a learning rate of 0.004. Weight decay (L2 regularization) is set to 0.0001. |