Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video

Authors: Xueyang Yu, Cheng Shi, Yang Wang, Sibei Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate and address these properties, we first introduce ESTP-Bench (Ego Streaming Proactive Benchmark) alongside the ESTP-F1 metric a novel framework designed for their rigorous assessment. Secondly, we propose a comprehensive technical pipeline to enable models to tackle this challenging task. ... Our proposed model effectively addresses these critical properties while outperforming multiple baselines across diverse online and offline benchmarks. 4 Experiment
Researcher Affiliation Academia 1Shanghai Tech University 2School of Computer Science and Engineering, Sun Yat-sen University 3School of Computing and Data Science, The University of Hong Kong
Pseudocode No The paper describes methods and pipelines in textual form and diagrams (e.g., Fig. 4) but does not include any clearly labeled pseudocode or algorithm blocks. Descriptions are provided in regular paragraph text.
Open Source Code No Answer: [No] Justification: Code will be released after acceptance.
Open Datasets Yes Data Source is validation set of Ego4D [17, 44] that includes raw annotations such as event narrations and steps for completing consistent goals.
Dataset Splits Yes To address this novel task, we propose a comprehensive and novel technical pipeline including a data engine, multi-stage training strategies, and a proactive dynamic compression technique to enhance the streaming video LLMs. Specifically, ... For the data engine, utilizing the Ego4D [17] training set and a three-stage generation pipeline as introduced in Sec. 1, we generate 60K single-turn and 20K multi-turn questions, as shown in Fig. 4. Each generated instance includes questions, answers, and their corresponding valid answer intervals (named as ESTP-IT). See Appendix for data engine details.
Hardware Specification Yes Action Per Second versus ESTP Score for various models, measured on an A40 GPU, demonstrating synchronization efficiency challenges.
Software Dependencies No The paper mentions base models like LLa MA3 and Sig LIP, and training techniques like Lo RA, but does not provide specific version numbers for software libraries, programming languages, or other ancillary software dependencies.
Experiment Setup Yes In this section, we provide detailed implementation configurations of our training methodology. ESTP-Bench. Our training methodology employs a three-stage strategy to progressively endow the Video LLM-Eye WO model with advanced proactive capabilities. Tab. 6 summarizes key details of each stage s specific configuration and learning objectives, while Tab. 5 presents the corresponding ablation results. Table 6: Multi-Stage Training Plan in ESTP (includes Batch Size per Device, Gradient Accumulation, Learning Rate, Warm-up Ratio, LR Scheduler, Optimizer, Epochs, Precision).