Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
Authors: Xueyang Yu, Cheng Shi, Yang Wang, Sibei Yang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate and address these properties, we first introduce ESTP-Bench (Ego Streaming Proactive Benchmark) alongside the ESTP-F1 metric a novel framework designed for their rigorous assessment. Secondly, we propose a comprehensive technical pipeline to enable models to tackle this challenging task. ... Our proposed model effectively addresses these critical properties while outperforming multiple baselines across diverse online and offline benchmarks. 4 Experiment |
| Researcher Affiliation | Academia | 1Shanghai Tech University 2School of Computer Science and Engineering, Sun Yat-sen University 3School of Computing and Data Science, The University of Hong Kong |
| Pseudocode | No | The paper describes methods and pipelines in textual form and diagrams (e.g., Fig. 4) but does not include any clearly labeled pseudocode or algorithm blocks. Descriptions are provided in regular paragraph text. |
| Open Source Code | No | Answer: [No] Justification: Code will be released after acceptance. |
| Open Datasets | Yes | Data Source is validation set of Ego4D [17, 44] that includes raw annotations such as event narrations and steps for completing consistent goals. |
| Dataset Splits | Yes | To address this novel task, we propose a comprehensive and novel technical pipeline including a data engine, multi-stage training strategies, and a proactive dynamic compression technique to enhance the streaming video LLMs. Specifically, ... For the data engine, utilizing the Ego4D [17] training set and a three-stage generation pipeline as introduced in Sec. 1, we generate 60K single-turn and 20K multi-turn questions, as shown in Fig. 4. Each generated instance includes questions, answers, and their corresponding valid answer intervals (named as ESTP-IT). See Appendix for data engine details. |
| Hardware Specification | Yes | Action Per Second versus ESTP Score for various models, measured on an A40 GPU, demonstrating synchronization efficiency challenges. |
| Software Dependencies | No | The paper mentions base models like LLa MA3 and Sig LIP, and training techniques like Lo RA, but does not provide specific version numbers for software libraries, programming languages, or other ancillary software dependencies. |
| Experiment Setup | Yes | In this section, we provide detailed implementation configurations of our training methodology. ESTP-Bench. Our training methodology employs a three-stage strategy to progressively endow the Video LLM-Eye WO model with advanced proactive capabilities. Tab. 6 summarizes key details of each stage s specific configuration and learning objectives, while Tab. 5 presents the corresponding ablation results. Table 6: Multi-Stage Training Plan in ESTP (includes Batch Size per Device, Gradient Accumulation, Learning Rate, Warm-up Ratio, LR Scheduler, Optimizer, Epochs, Precision). |