Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Generalized Label Shift Perspective for Cross-Domain Gaze Estimation

Authors: Hao-Ran Yang, Xiaohui Chen, Chuan-Xian Ren

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on standard CDGE tasks with different backbone models validate the superior generalization capability across domain and applicability on various models of proposed method.
Researcher Affiliation	Academia	Hao-Ran Yang Sun Yat-Sen University Guangzhou, China EMAIL Xiaohui Chen Sun Yat-Sen University Guangzhou, China EMAIL Chuan-Xian Ren Sun Yat-Sen University Guangzhou, China EMAIL
Pseudocode	Yes	Algorithm 1: Optimization of GLSGE
Open Source Code	No	As our work is about a general framework for CDGE problems, the reproducibility can be guaranteed by the algorithm description and implementation details provided in Sec. 4 and Appendix A.
Open Datasets	Yes	We conduct experiments on four standard CDGE datasets: ETH-XGaze (DE) [47],Gaze360 (DG) [13],MPIIFace Gaze (DM) [49] and Eye Diap (DD) [8].
Dataset Splits	Yes	During cross-domain learning, we use 10% of the unlabeled target domain images for training and another 10% for validation, with the remaining 80% used for testing. It means that 4500 images in DM and 1667 images in DD are used for training in each task.
Hardware Specification	Yes	An NVIDIA RTX 4080 GPU is used for the experiments.
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'cosine annealing scheduler' but does not provide specific version numbers for any software libraries or frameworks. The prompt explicitly requires specific version numbers for ancillary software.
Experiment Setup	Yes	We use the Adam optimizer with the learning rate of 3e 5 and a cosine annealing scheduler to decrease the learning rate in the training process. The batch size is set to be 100. As the domain shift is distinct at the beginning, we alternately correct the label shift and the conditional shift to produce better pseudo label. The confidence that decides the truncated area in label shift correction process is emprically set to 0.7 for all tasks.