reproducibilityindex.ai

Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks

Authors: Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, Shi Gu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments over several popular network structures show that training SNN with Dspike consistently outperforms the state-of-the-art training methods. For example, on the CIFAR10-DVS classiﬁcation task, we can train a spiking Res Net-18 and achieve 75.4% top-1 accuracy with 10 time steps.
Researcher Affiliation	Academia	Yuhang Li1 , Yufei Guo2 , Shanghang Zhang3, Shikuang Deng1, Yongqing Hai2, Shi Gu1 1University of Electronic Science and Technology of China, 2Peking University, 3University of California Berkeley
Pseudocode	Yes	Algorithm 1: Training SNN with Dspike gradient Input: SNN to be trained; Training dataset, total training epoch E, total training iteration in one epoch I, FDG step size ε, temperature update step size b, initialized temperature b0. for all e = 1, 2, . . . , E-th epoch do Collect input data and labels, computing the FDG ( ˆ ε,w L) of the ﬁrst layer using ε; for all b in {be 1, be 1 + b, be 1 b} do Compute Dsipke surrogate gradient b,w L and cos_sim( ˆ ε,w L, b,w L) Find the optimal temperature with the highest cosine similarity and update it to be; for all i = 1, 2, . . . , I-iteration do Get training data and labels, compute Dspike surrogate gradient using optimal b; Descend loss function and update weights; return trained SNN.
Open Source Code	Yes	Our code will be available in the supplemental.
Open Datasets	Yes	We use CIFAR10/100 [34], Image Net [35], and CIFAR10-DVS [45].
Dataset Splits	No	The paper specifies training and test set sizes (e.g., '50K training and 10K test images' for CIFAR, '9k training images and 1k test images' for CIFAR10-DVS, '1250k training images and 50k test images' for ImageNet), but it does not explicitly mention or quantify a separate validation split for these datasets.
Hardware Specification	Yes	We run the model with 4 GTX 1080Tis.
Software Dependencies	No	The paper mentions general ML frameworks (Pytorch [5], JAX [6], and Tensorﬂow [7]) in the introduction, but it does not provide specific software dependencies with version numbers for its own implementation or experiments.
Experiment Setup	Yes	We use Auto Augment [37] and Cutout [38] for data augmentation. We use Res Net-18 architecture for running experiments. We train the corresponding ANN ﬁrst and use it to initialize the ﬁrst SNN with (T = 6), then we use Time Inheritance Training to gradually reduce the time step to 2. For FDG computation, we use the best practice ε = 0.1 in the toy experiments. The temperature is initialized to 1, and we set the update step size to b = 0.2. We adopt SGD optimizer with 0.9 momentum. In the ﬁrst round of TIT, we train the model for 300 epochs with a learning rate of 0.01 cosine decayed [39] to 0. In the rest rounds of TIT, we only train the model for 50 epochs with a learning rate of 0.004. Weight decay (L2 regularization) is set to 0.0001.