Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model

Authors: Jialong Zuo, Yongtai Deng, Mengdan Tan, Rui Jin, Dongyue Wu, Nong Sang, Liang Pan, Changxin Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify the advancement and practicality of our ORBench. A range of models have been compared on it, and our proposed Re ID5o gives the best performance.
Researcher Affiliation Collaboration Jialong Zuo 1 Yongtai Deng 1 Mengdan Tan 1 Rui Jin 1 Dongyue Wu 1 Nong Sang 1 Liang Pan 2 Changxin Gao 1 1 National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, 2 Shanghai AI Laboratory. EMAIL
Pseudocode No The paper describes the method using textual explanations and a schematic diagram (Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code & Dataset: https://github.com/Zplusdragon/Re ID5o_ORBench
Open Datasets Yes Code & Dataset: https://github.com/Zplusdragon/Re ID5o_ORBench To address dataset scarcity, we construct ORBench, the first high-quality multi-modal dataset comprising 1,000 unique identities across five modalities: RGB, infrared, color pencil, sketch, and textual description.
Dataset Splits Yes There are 1,000 valid identities in ORBench dataset. We have a fixed split using 600 identities for training and 400 identities for testing. During training, all multi-modal data for the 600 persons in the training set can be applied.
Hardware Specification Yes We use a single A100 80GB GPU.
Software Dependencies No The paper mentions using a pre-trained multi-modal encoder, i.e., CLIP-B/16, and the Adam optimizer, but it does not specify version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes For each layer of the feature mixture (FM), the hidden size and number of heads are set to 512 and 8. The hyper-parameter α for the ID loss is set to 1.0. During training, all images are uniformly resized to 384 128 and the maximum length of the textual tokens is set to 77. Also, random erasing, horizontally flipping and crop with padding are employed for image augmentation. Random masking and replacement is employed for text augmentation. Our Re ID5o is trained with Adam [19] for 60 epochs with an initial learning rate 1e 5. We spend 5 warm-up epochs linearly increasing the learning rate from 1e 6 to 1e 5. For the random-initialized experts and feature mixture, the initial learning rate is set to 5e 5.