Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution

Authors: Limeng Qiao, Yiyang Gan, Bairui Wang, Jie Qin, Shuang Xu, Siqi Yang, Lin Ma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations demonstrate the effectiveness of our proposed methods.
Researcher Affiliation	Collaboration	Limeng Qiao Yiyang Gan Bairui Wang Jie Qin Shuang Xu Siqi Yang Lin Ma Meituan Inc. EMAIL, EMAIL, EMAIL EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper includes architectural diagrams (Figure 2) and describes methods in prose, but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and models are available here.
Open Datasets	Yes	We collect public accessible image-text pairs and build our Merged-1B dataset, which is composed of Data Comp-1B [21], COYO [22], LAION-2B [23], LAION-400M [24], DFN-2B [22], CC12M [25] and CC3M [26]. Moreover, to further enhance the video feature extraction capabilities of Uni Vi TAR, we meticulously constructed a dataset Merged-65M of roughly 65 million samples by randomly selecting video clips from three public accessible video datasets, i.e., Panda-70M [27], Web Vid-10M [28], and Intern Vid-10M-FLT [29].
Dataset Splits	Yes	For cross-modal retrieval assessment, we adopt the benchmark protocols defined in [41], evaluating on Flickr [42] and MS-COCO [43] using their official partitions.
Hardware Specification	Yes	Note all experiments are conducted on H800 GPUs.
Software Dependencies	No	To enhance training efficiency, we integrated the Deep Speed library [30] by employing Ze RO optimizer sharding [31], gradient checkpointing [32], and flash attention [33].
Experiment Setup	Yes	The detailed hyperparameter configurations for each training stage are presented in the Table 11.