Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Gaze Beyond the Frame: Forecasting Egocentric 3D Visual Span

Authors: Heeseung Yun, Joonil Na, Jaeyeon Kim, Calvin Murdock, Gunhee Kim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach outperforms competitive baselines for egocentric 2D gaze anticipation and 3D localization while achieving comparable results even when projected back onto 2D image planes without additional 2D-specific training. In addition, we curate a comprehensive benchmark from raw egocentric multisensory data, creating a testbed with 364.6K samples for 3D visual span forecasting.
Researcher Affiliation Collaboration Heeseung Yun1, Joonil Na1, Jaeyeon Kim2, Calvin Murdock3, Gunhee Kim1 1Seoul National University, 2Carnegie Mellon University, 3Reality Labs Research at Meta
Pseudocode No The paper describes the methods in narrative text and uses figures (like Figure 2 and Figure 3) to illustrate the framework, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We provide necessary details for reproducing experiments as well as source code as supplementary materials.
Open Datasets Yes In addition, we curate a comprehensive benchmark from raw egocentric multisensory data, creating a testbed with 364.6K samples for 3D visual span forecasting. [...] We curate a dataset by processing the raw data streams from an existing egocentric multisensory dataset. Aria Everyday Activities dataset [23] encompasses diverse scenarios... [...] we utilize Ego-Exo4D [24] as our source data... The raw egocentric data sources used for curating the testbed are Aria Everyday Activities [2] and Ego-Exo4D [3]. Aria Everyday Activities1 is released under a custom license2 that permits academic research only, while Ego-Exo4D3 uses a custom license4 allowing both academic and commercial usage. 1https://www.projectaria.com/datasets/aea/ 3https://ego-exo4d-data.org/
Dataset Splits Yes Consequently, our constructed Fo VS(Forecasting 3D Visual Span)-Aria consists of 23.2k samples in total, with 19.3k, 1.9k, and 2.1k samples for train, validation, and test splits, respectively." and "Thus, we collect a total of 341.4k samples, divided into 274.7k, 29.6k, and 37.0k samples for train, validation, and test splits, respectively.
Hardware Specification Yes All experiments are conducted using NVIDIA RTX A6000 GPUs with 48GB memory and 16 CPU cores." and "using 8 CPU cores and a GPU with 12GB VRAM.
Software Dependencies Yes Additionally, we conduct experiments using Py Torch 2.4.1, employing Open3D [4] for data processing and the Py Torch-3DUNet [5] library for model construction, both of which are distributed under the MIT License.
Experiment Setup Yes For all reported experiments, we use the Adam optimizer [1] with a learning rate of 1e-4, without applying any scheduler or weight decay. We train our models with a batch size of 16 for 50 epochs and select the epoch that achieves the highest average Io U across all visual spans on the validation split for final testing. Input volumes are augmented through axis permutation and flipping (excluding upside-down orientation), along with random translation applied up to 2 units. The unidirectional transformer utilizes a feature dimension of C = 1024. ... an encoding progression of 5-128-256-512-1024 for 163 grids. Each U-Net layer consists of two applications of Conv-Batch Norm-Re LU-Dropout blocks, with a dropout rate of 0.1.