SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Authors: Malyaban Bal, Abhronil Sengupta
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark. Our implementation source code is available at https://github.com/Neuro Comp Lab-psu/Spiking BERT. ... In this section, we demonstrate the performance of our proposed spiking LM and evaluate it against different tasks in the General Language Understanding Evaluation (GLUE) benchmark (Wang et al. 2018). ...Table 1: Results showing performance of our model (Spiking BERT4) against some standard models and other efficient implementations of BERT on GLUE evaluation set. |
| Researcher Affiliation | Academia | Malyaban Bal, Abhronil Sengupta School of Electrical Engineering and Computer Science The Pennsylvania State University University Park, PA 16802 mjb7906@psu.edu, sengupta@psu.edu |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described in narrative text and mathematical equations. |
| Open Source Code | Yes | Our implementation source code is available at https://github.com/Neuro Comp Lab-psu/Spiking BERT. |
| Open Datasets | Yes | In order to evaluate our model, we chose seven different type of tasks (six classification and one regression task) from the GLUE benchmark. We chose Quora Question Pair (QQP), Microsoft Research Paraphrase Corpus (MRPC) and Semantic Textual Similarity Benchmark (STSB) (regression task) to evaluate our model on similarity and paraphrase tasks. For inference tasks, we opted for Multi Genre Natural Language Inference (MNLI), Question answering NLI (QNLI) and Recognizing Textual Entailment (RTE) datasets. For single-sentence based sentiment analysis tasks, we chose Stanford Sentiment Treebank (SST-2). |
| Dataset Splits | No | The paper discusses datasets used but does not explicitly provide details about training/validation/test splits with percentages or sample counts. It mentions evaluation on the GLUE benchmark and specific datasets within it, implying standard splits, but doesn't define them. |
| Hardware Specification | Yes | The experiments were run on Nvidia RTX A5000 GPUs (8) each with 24GB memory. |
| Software Dependencies | No | The paper does not explicitly list software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). It implies the use of a BERT model but no specific software environment details are provided. |
| Experiment Setup | Yes | Table 2: Hyper-parameters (explored range and optimal values) for Spiking BERT4 used across all datasets. Tconv: General KD (5-150) 80 Tconv: Task-based IKD (5-150) 80 Vth (Threshold Voltage) (0.25 5.0) 1.0 γ (Leak term) (0.8 1.0) .99 (LIF); 1 (IF) t (Temperature) (0.1 10.0) 1.0 Batch Size: General KD (8-256) 128 Batch Size: Task-based IKD (8-128) [16,32] Epochs: General KD 5 Epochs: Task-based IKD 20 |