Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks
Authors: Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, Shi Gu
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over several popular network structures show that training SNN with Dspike consistently outperforms the state-of-the-art training methods. For example, on the CIFAR10-DVS classification task, we can train a spiking Res Net-18 and achieve 75.4% top-1 accuracy with 10 time steps. |
| Researcher Affiliation | Academia | Yuhang Li1 , Yufei Guo2 , Shanghang Zhang3, Shikuang Deng1, Yongqing Hai2, Shi Gu1 1University of Electronic Science and Technology of China, 2Peking University, 3University of California Berkeley |
| Pseudocode | Yes | Algorithm 1: Training SNN with Dspike gradient Input: SNN to be trained; Training dataset, total training epoch E, total training iteration in one epoch I, FDG step size ε, temperature update step size b, initialized temperature b0. for all e = 1, 2, . . . , E-th epoch do Collect input data and labels, computing the FDG ( ˆ ε,w L) of the first layer using ε; for all b in {be 1, be 1 + b, be 1 b} do Compute Dsipke surrogate gradient b,w L and cos_sim( ˆ ε,w L, b,w L) Find the optimal temperature with the highest cosine similarity and update it to be; for all i = 1, 2, . . . , I-iteration do Get training data and labels, compute Dspike surrogate gradient using optimal b; Descend loss function and update weights; return trained SNN. |
| Open Source Code | Yes | Our code will be available in the supplemental. |
| Open Datasets | Yes | We use CIFAR10/100 [34], Image Net [35], and CIFAR10-DVS [45]. |
| Dataset Splits | No | The paper specifies training and test set sizes (e.g., '50K training and 10K test images' for CIFAR, '9k training images and 1k test images' for CIFAR10-DVS, '1250k training images and 50k test images' for ImageNet), but it does not explicitly mention or quantify a separate validation split for these datasets. |
| Hardware Specification | Yes | We run the model with 4 GTX 1080Tis. |
| Software Dependencies | No | The paper mentions general ML frameworks (Pytorch [5], JAX [6], and Tensorflow [7]) in the introduction, but it does not provide specific software dependencies with version numbers for its own implementation or experiments. |
| Experiment Setup | Yes | We use Auto Augment [37] and Cutout [38] for data augmentation. We use Res Net-18 architecture for running experiments. We train the corresponding ANN first and use it to initialize the first SNN with (T = 6), then we use Time Inheritance Training to gradually reduce the time step to 2. For FDG computation, we use the best practice ε = 0.1 in the toy experiments. The temperature is initialized to 1, and we set the update step size to b = 0.2. We adopt SGD optimizer with 0.9 momentum. In the first round of TIT, we train the model for 300 epochs with a learning rate of 0.01 cosine decayed [39] to 0. In the rest rounds of TIT, we only train the model for 50 epochs with a learning rate of 0.004. Weight decay (L2 regularization) is set to 0.0001. |