Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SpikingVTG: A Spiking Detection Transformer for Video Temporal Grounding
Authors: Malyaban Bal, Brian Matejek, Susmit Jha, Adam Cobb
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Spiking VTG variants on moment retrieval and highlight detection tasks using the QVHighlights, Charades-STA, TACo S and Youtube Highlight datasets. Our model achieves competitive results compared to the current SOTA non-spiking models. Ablation Study: As demonstrated in Table 5, the inclusion of the SFG mechanism enhances performance compared to the model without SFG. |
| Researcher Affiliation | Collaboration | Malyaban Bal1,2 , Brian Matejek2, Susmit Jha2, Adam D. Cobb2 1The Pennsylvania State University 2Computer Science Laboratory, SRI International |
| Pseudocode | No | The paper describes the architecture and mechanisms using textual descriptions, equations, and figures (e.g., Figure 1 for high-level overview, Figure 2 for SFG operations), but it does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We are uploading the code as part of the submission. If accepted we will make it open access on github. |
| Open Datasets | Yes | We evaluate Spiking VTG variants on moment retrieval and highlight detection tasks using the QVHighlights, Charades-STA, TACo S and Youtube Highlight datasets. QVHighlights: The QVHighlights dataset [1] stands out as the sole dataset providing annotations for both moment retrieval and highlight detection, making it an excellent resource for benchmarking on both the VTG tasks. Charades-STA: The Charades-STA [42] dataset comprises 16,128 indoor videos, each with an average duration of 30.6 seconds. TACo S: TACo S [43] consists of 127 videos, each averaging 4.78 minutes in length. Youtube Highlights: You Tube Highlights [44] consists of 433 videos across 6 domains, using the domain names as text queries. |
| Dataset Splits | Yes | Charades-STA: ...It includes 12,408 query-interval pairs designated for training and 3,720 query-interval pairs reserved for testing. TACo S: ...The dataset is split into 75 videos for training, 27 for validation, and 25 for testing. |
| Hardware Specification | Yes | The experiments were run on a NVIDIA RTX A6000 GPU with 48GB memory. The CPU utilized is an AMD Ryzen Threadripper 3960X 24-Core Processor. |
| Software Dependencies | No | We have used Python and the Py Torch framework to write the code. |
| Experiment Setup | Yes | We have used the Adam optimizer to train our model. We list the hyper-parameters used in the work in Table 6. We used grid search to find optimal values. Table 6: Hyper-parameters of our Spiking VTG model. Optimal values for QVHighlights dataset is also shown. |