Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OmniGaze: Reward-inspired Generalizable Gaze Estimation in the Wild

Authors: Hongyu Qu, Jianan Wei, Xiangbo Shu, Yazhou Yao, Wenguan Wang, Jinhui Tang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that OMNIGAZE achieves state-of-the-art performance on five datasets under both in-domain and cross-domain settings. Furthermore, we also evaluate the efficacy of OMNIGAZE as a scalable data engine for gaze estimation, which exhibits robust zero-shot generalization on four unseen datasets.
Researcher Affiliation	Academia	Hongyu Qu1 , Jianan Wei2 , Xiangbo Shu1 , Yazhou Yao1, Wenguan Wang2 , Jinhui Tang3 1Nanjing University of Science and Technology 2Zhejiang University 3 Nanjing Forestry University
Pseudocode	Yes	Algorithm S1 provides the pseudo-code of the reward model.
Open Source Code	Yes	https://github.com/quhongyu/Omni Gaze
Open Datasets	Yes	Concretely, we compile face images from six public datasets to construct a large-scale unlabeled dataset compassing over 1.4 million images, which covers diverse head poses, lighting conditions, appearance, etc. Table 1 provides a detailed breakdown of this dataset. ... CelebA [59], VGGFace2 [60], Face Synthetics [28], SFHQ-T2I [61], VFHQ [62], Web Face [27]
Dataset Splits	Yes	We curate a comprehensive training dataset by combining labeled gaze datasets with six public unlabeled face datasets [59, 60, 28, 61, 62, 27] (Table 1). Note that in in-domain gaze estimation, labeled gaze datasets comprises ETH-XGaze [13] along with specific evaluation training datasets; In cross-domain gaze estimation, we only use source datasets as labeled gaze datasets. For evaluation, beyond the test split of Gaze360 [14], we further assess OMNIGAZE on four widely used benchmarks: MPIIFace Gaze [43], Eye Diap [16], RT-Gene [44], and IVGaze [12].
Hardware Specification	Yes	OMNIGAZE is implemented in Py Torch, and trained on on 4 NVIDIA RTX 3090 GPUs with 24GB memory per card.
Software Dependencies	No	OMNIGAZE is implemented in Py Torch
Experiment Setup	Yes	OMNIGAZE is trained with a batch size of 512. The training of OMNIGAZE can be divided into two stages: i) The teacher model is trained on labeled datasets for 50 epochs. We utilize the Adam optimizer [65] with an initial learning rate of 0.005, and a weight decay of 0.05. ii) We train the student model and reward model on both labeled and unlabeled data for 40 epochs with a base learning rate of 0.001 and 0.0001, respectively. Hyper-parameter K is empirically set to 10.