Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention

Authors: Haijing Liu, Zhiyuan Song, Hefeng Wu, Tao Pu, Keze Wang, Liang Lin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that CERES achieves state-of-the-art performance on Ego-RVOS benchmarks, highlighting the potential of applying causal reasoning to build more reliable models for broader egocentric video understanding.
Researcher Affiliation	Academia	Haijing Liu, Zhiyuan Song, Hefeng Wu , Tao Pu, Keze Wang, Liang Lin Sun Yat-sen University, Guangzhou 510006, China
Pseudocode	No	The paper describes the methodology using equations and prose, for example, Equations (1) through (11), but no explicitly labeled 'Pseudocode' or 'Algorithm' block is present.
Open Source Code	No	We will release our code and detailed instructions for data preparation and reproducing the main experimental results after the review period.
Open Datasets	Yes	Following previous work [36], we evaluate our method on three public egocentric video datasets: VISOR [12], VOST [46], and VSCOS [62].
Dataset Splits	Yes	VISOR, derived from EPIC-KITCHENS [10, 11], provides annotations for hands and active object interactions; we utilize its training and validation splits. After preprocessing, this yields 13,205 videos (76,873 objects) for training and 467 videos (1,841 objects) for validation, where validation objects are manually annotated as positive or negative. VOST and VSCOS are used for validation only.
Hardware Specification	Yes	All experiments were conducted using Py Torch 2.1.2 and CUDA 11.8 on a system with four NVIDIA V100 GPUs.
Software Dependencies	Yes	All experiments were conducted using Py Torch 2.1.2 and CUDA 11.8 on a system with four NVIDIA V100 GPUs.
Experiment Setup	Yes	Models were trained for 6 epochs with a total batch size of 4, where each batch item was a single video clip. We initialized the learning rate to 1 × 10−3 for our CERES modules and 1 × 10−4 for pre-trained components, decaying it by 0.1 at epochs 3 and 5, using the Adam W optimizer. The primary segmentation loss combined bounding box, Dice and Focal losses. Input frames are resized to 448 × 448.