Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cross-Modal Representational Knowledge Distillation for Enhanced Spike-informed LFP Modeling

Authors: Eray Erturk, Saba Hashemi, Maryam M. Shanechi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that the Distilled LFP models consistently outperform singleand multi-session LFP baselines in both fully unsupervised and supervised settings, and can generalize to other sessions without additional distillation while maintaining superior performance. These findings demonstrate that cross-modal knowledge distillation is a powerful and scalable approach for leveraging high-performing spike models to develop more accurate LFP models.
Researcher Affiliation Academia Eray Erturk1 Saba Hashemi2 Maryam M. Shanechi1,2,3,4 1Ming Hsieh Department of Electrical and Computer Engineering 2Thomas Lord Department of Computer Science 3Alfred E. Mann Department of Biomedical Engineering 4Neuroscience Graduate Program University of Southern California, Los Angeles, CA EMAIL
Pseudocode No The paper describes methodologies and architectures (e.g., Fig. 1, Fig. 2, Fig. 7) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release models and an inference notebook for reproducibility at https://github.com/Shanechi Lab/Cross Modal Distillation.
Open Datasets Yes All datasets used in this study are publicly available datasets as shown in Table 2, whose experimental and preprocessing details are as follows.
Dataset Splits Yes For each dataset, we applied a random 80% 20% train test split at the segment (sequence) level.
Hardware Specification Yes All models were trained in a cluster with 8 NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions techniques and optimizers such as "automatic mixed precision (AMP) training", "flash-attention mechanism [82]", "rotary positional embeddings [81]", and "Adam W optimizer [83]", but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For all models, we employed 10-layer transformer encoder backbones (except the scaling analysis in Fig. 10) and a hidden state dimension of 256. We used a sequential learning rate composed of 1) a linear warmup learning rate with a start factor of 0.3 that reached its maximum value of 0.000625 over 30 epochs, and 2) an exponential learning rate with a decay factor of 0.995. We used Adam W optimizer [83] with weight decay factor starting from 0.1 and reaching to its maximum value of 0.4 over 1000 epochs (which was never achieved for any model). For models trained/fine-tuned on the MAE objective, we used a masking probability of 0.6 that is randomly applied on neural patches across space and time, and used 4-layer transformer predictors with a hidden state dimension of 192, followed by a 64-dimensional down-projection. For supervised fine-tuning (combination of MAE and behavior regression objectives), we did not apply any scaling on the two loss terms, and for the distillation objective, we scaled the representation alignment objective by ΜΈΜΈ = 5.