Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Aha! - Predicting What Matters Next: Online Highlight Detection Without Looking Ahead

Authors: Aiden Chang, Celso de Melo, Stephanie Lukin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments This section details the comprehensive experimental evaluation of AHA. We first assess its core performance as an OHD model under strict streaming constraints on two standard HD benchmarks, TVSum and Mr.Hi Sum (Section 4.1). We then evaluate its robustness to common video degradations and conduct ablation studies to analyze the contributions of its key components (Section 4.2). To demonstrate its practical applicability in challenging real-world conditions, we further test AHA s capabilities on a long-form robotics video (Section 4.3), and generalization potential to other unoptimized video understanding tasks (Section 4.4). Our results are averaged over 5 runs.
Researcher Affiliation	Collaboration	Aiden Chang University of Southern California Los Angeles, CA 90089 EMAIL Celso De Melo DEVCOM Army Research Laboratory Adelphi, MD 20783 Stephanie M. Lukin DEVCOM Army Research Laboratory Adelphi, MD 20783
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided in the main text or appendices. The methodology is described verbally and with architectural diagrams.
Open Source Code	Yes	1github.com/aiden200/Aha- The instructions on how to download the dataset will be included in the github. All code and data required to reproduce the main experimental results will be made publicly available upon acceptance, including training scripts, evaluation code, and documentation.
Open Datasets	Yes	We construct and release the Human Intuition Highlight Dataset (HIHD), a novel dataset of ~23k videos... AHA surpasses prior methods, including offline approaches, on the HD benchmarks TVSum [14] (+5.9% m AP) and Mr.Hisum [15] (+8.3% m AP).
Dataset Splits	Yes	Crucially, HIHD adopts the exact train/validation/test splits from Mr.Hi Sum to ensure fair comparability, and its training set explicitly excludes videos present in common highlight detection evaluation datasets.
Hardware Specification	Yes	Training was performed on 3 compute nodes, each with 2 NVIDIA A6000 GPUs (48GB VRAM), totaling 6 GPUs. The system achieved a sustained throughput of 1 frame per second (FPS), demonstrating high efficiency with 100% peak GPU utilization and 90% peak memory controller utilization. During this process, the framework consumed a peak of 30.49 GB of VRAM across both GPUs and operated well within safe thermal limits at a peak temperature of 65 C, all while maintaining a minimal system RAM footprint of 3.66 GB.
Software Dependencies	Yes	AHA was trained using Py Torch 2.5.1, Transformers 4.49.0, and CUDA 12.4 on Ubuntu 22.04.
Experiment Setup	Yes	Table 5: Key hyperparameters for training AHA. Category Hyperparameter (Value) Optimization Optimizer Adam W [45] Betas (optimizer) (0.9, 0.999) Epsilon (optimizer) 1 10 8 Weight decay 0.0 Learning rate 2 10 5 LR scheduler Cosine decay with linear warmup Warmup ratio 0.05 (0 warmup steps) Gradient norm clipping 1.0 Gradient checkpointing Enabled Batching Per-device train batch size 1 Gradient accumulation steps 2 (effective batch size = 2) Num epochs 1 Precision & Acceleration BF16 training Enabled Deep Speed zero2 [46] + CPU offload Attn implementation Flash Attention2 [47] Data loading Dataloader workers 4 Pin memory True Drop last batch False Video preprocessing Frame rate 1 fps Frame resolution 384 384 Pooling stride 4 Frame tokens (#) 49 Token pooling dims [7, 7] Model backbones LLM backbone lmms-lab/llava-onevision-qwen2-7b-ov Vision backbone google/siglip-large-patch16-384 Multimodal projector 3 3 conv + linear layers Losses & regularization Stream loss weight 1.0 TV loss window 49 Saving & logging Save strategy steps (every 25 steps) Save total limit 5 checkpoints Logging strategy steps (every 1 step)