reproducibilityindex.ai

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

Authors: Xingrun Xing, Zheng Zhang, Ziyi Ni, Shitao Xiao, Yiming Ju, Siqi Fan, Yequan Wang, Jiajun Zhang, Guoqi Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate previous SNNs and Spike LMs on a range of general language tasks, including discriminative and generative ones.
Researcher Affiliation	Academia	1Institute of Automation, Chinese Academy of Sciences 2School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences 3Beijing Academy of Artiﬁcial Intelligence.
Pseudocode	No	No structured pseudocode or algorithm blocks are present.
Open Source Code	Yes	Our code is available at https://github.com/Xingrun Xing/Spike LM.
Open Datasets	Yes	In pretraining, we use the Books Corpus (Zhu et al., 2015) and English Wikipedia (Devlin et al., 2018) as training data, including 800M and 2500M words respectively. In ﬁnetuning, we use the GLUE benchmark training with the common settings of ANNs.
Dataset Splits	Yes	We follow the standard ANN-based BERT to develop SNN-based LIF-BERT and Spike LM, which include two stages: pretraining and ﬁnetuning. In pretraining, we use the Books Corpus (Zhu et al., 2015) and English Wikipedia (Devlin et al., 2018) as training data, including 800M and 2500M words respectively. In ﬁnetuning, we use the GLUE benchmark training with the common settings of ANNs.
Hardware Specification	Yes	All SNN models are trained on a single node with 8 A800 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch' and 'Spikingjelly' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We set the maximum length of each sentence as 128 tokens. The batch size is set to 512 in training. The entire pretraining encompasses a total of 10^5 steps. The same as ANN conditions, we train SNNs with an Adam W optimizer with a 2 * 10^-4 peak learning rate and 0.01 weight decay. We adapt the learning rate by a linear schedule with 5000 warm-up steps. ... we maintain a constant learning rate of 2 * 10^-5 and a batch size of 32 for all subsets. ... For XSUM, CNN-Daily Mail, and WMT16 datasets, we use the Adam W optimizer and train 20 epochs with a 128 batch size, and a peak learning rate of 3.5 * 10^-4, 7 * 10^-4, or 1 * 10^-4 respectively.