Sign Gradient Descent-based Neuronal Dynamics: ANN-to-SNN Conversion Beyond ReLU Network

Authors: Hyunseok Oh, Youngki Lee

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on large-scale datasets show that our technique achieves (i) state-of-the-art performance in ANN-to-SNN conversion and (ii) is the first to convert new DNN architectures, e.g., Conv Next, MLP-Mixer, and Res MLP. and 6. Evaluations We demonstrate the practical effectiveness of our sign GD-based neuronal dynamics in four-fold. First, we validate our support for diverse nonlinearities by converting new DNN architectures. Second, we compare the accuracy of converted ANNs with existing conversion techniques. Third, we verify our design choices through ablation studies. Finally, we visualize the effect of our technique on SNN inference speed.
Researcher Affiliation Academia 1Department of Computer Science & Engineering, Seoul National University, Seoul, Republic of Korea.
Pseudocode Yes Algorithm 4 ANN-to-SNN Conversion with sign GD-based Neuron (Definition 5.2)
Open Source Code Yes We publicly share our source code at www.github.com/snuhcs/snn signgd .
Open Datasets Yes Experimental results on large-scale Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009) datasets
Dataset Splits No No explicit training/validation/test dataset splits (e.g., percentages or counts) are provided in the paper. It refers to 'training dataset' and 'random 100 batches' for normalization but not specific splits.
Hardware Specification Yes We run our experiments on a machine with AMD EPYC 7313 CPU, 512GB RAM, and NVIDIA RTX A6000.
Software Dependencies No The paper mentions 'spikingjelly (Fang et al., 2023) implementation', 'torchvision (maintainers & contributors, 2016)', and 'timm (Wightman et al., 2019)' as software used. However, specific version numbers for these software components are not provided.
Experiment Setup Yes To train DNN models for CIFAR datasets, we use SGD with learning rate 0.1, momentum 0.9, weight decay 5e-4, and cosine annealing schedule (Loshchilov & Hutter, 2016) of Tmax = 300.