Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention
Authors: Haijing Liu, Zhiyuan Song, Hefeng Wu, Tao Pu, Keze Wang, Liang Lin
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that CERES achieves state-of-the-art performance on Ego-RVOS benchmarks, highlighting the potential of applying causal reasoning to build more reliable models for broader egocentric video understanding. |
| Researcher Affiliation | Academia | Haijing Liu, Zhiyuan Song, Hefeng Wu , Tao Pu, Keze Wang, Liang Lin Sun Yat-sen University, Guangzhou 510006, China |
| Pseudocode | No | The paper describes the methodology using equations and prose, for example, Equations (1) through (11), but no explicitly labeled 'Pseudocode' or 'Algorithm' block is present. |
| Open Source Code | No | We will release our code and detailed instructions for data preparation and reproducing the main experimental results after the review period. |
| Open Datasets | Yes | Following previous work [36], we evaluate our method on three public egocentric video datasets: VISOR [12], VOST [46], and VSCOS [62]. |
| Dataset Splits | Yes | VISOR, derived from EPIC-KITCHENS [10, 11], provides annotations for hands and active object interactions; we utilize its training and validation splits. After preprocessing, this yields 13,205 videos (76,873 objects) for training and 467 videos (1,841 objects) for validation, where validation objects are manually annotated as positive or negative. VOST and VSCOS are used for validation only. |
| Hardware Specification | Yes | All experiments were conducted using Py Torch 2.1.2 and CUDA 11.8 on a system with four NVIDIA V100 GPUs. |
| Software Dependencies | Yes | All experiments were conducted using Py Torch 2.1.2 and CUDA 11.8 on a system with four NVIDIA V100 GPUs. |
| Experiment Setup | Yes | Models were trained for 6 epochs with a total batch size of 4, where each batch item was a single video clip. We initialized the learning rate to 1 × 10−3 for our CERES modules and 1 × 10−4 for pre-trained components, decaying it by 0.1 at epochs 3 and 5, using the Adam W optimizer. The primary segmentation loss combined bounding box, Dice and Focal losses. Input frames are resized to 448 × 448. |