Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LVLM-Driven Attribute-Aware Modeling for Visible-Infrared Person Re-Identification

Authors: Zhiqi Pang, Lingling Zhao, Junjie Wang, Chunyu Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on VI-Re ID datasets validate the effectiveness of the proposed LVLM-AAM and its individual components. LVLM-AAM not only significantly outperforms existing unsupervised methods but also surpasses several supervised methods.
Researcher Affiliation Academia Harbin Institute of Technology, China 2 Nanjing Medical University, China
Pseudocode Yes The overall algorithmic procedure is provided in Supplementary Material Section S.III.
Open Source Code No Given that our method represents a novel and preliminary attempt, we are considering releasing a more polished and comprehensive version of the code in the future. In the meantime, we have provided key experimental details in the Implementation Details subsection and included an algorithmic procedure in Section S.III of the Supplementary Material to enable researchers to partially or fully reproduce our method.
Open Datasets Yes We evaluate our method on the SYSU-MM01 [31], Reg DB [23] and LLCM [52] datasets.
Dataset Splits Yes Following existing methods [3, 37], a total of 22,258 visible images and 11,909 infrared images from 395 identities are used for training. The query set and gallery consist of infrared and visible images, respectively, from the remaining 96 identities. Reg DB contains 412 identities, with each identity having 10 visible images and 10 thermal infrared images. Following existing protocols [3, 37], we use images from 206 identities for training and the remaining 206 identities for testing.
Hardware Specification Yes The experiments are conducted on four NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies No The image encoder of LVLM-AAM is based on a pretrained Res Net-50 [13] and consists of two branches to separately handle inputs from the visible and infrared modalities. We use DBSCAN [6] to perform intra-modality clustering. We adopt the Adam optimizer [16] for model training. The pretrained models and large vision-language models used in this paper are publicly available and widely adopted in the research community.
Experiment Setup Yes All images are resized to 288 144, and random flipping, random grayscale conversion [19], channel augmentation [47], and random erasing [55] are applied as data augmentation. We set the batch size B to 128. In each iteration, we randomly select 8 clusters from each modality, and sample 16 images from each cluster. We set the batch size B to 128. In each iteration, we randomly select 8 clusters from each modality, and sample 16 images from each cluster. We use DBSCAN [6] to perform intra-modality clustering, where the distance threshold and the minimum number of samples are set to 0.6 and 4, respectively, on SYSU-MM01 [31], and to 0.3 and 4 on Reg DB [23]. We adopt the Adam optimizer [16] for model training. Homogeneous learning (i.e., Eq. 3) is performed for 50 epochs, followed by an update of the learnable text embeddings (i.e., Eq. 6) over another 50 epochs. Finally, heterogeneous learning (i.e., Eq. 11) is conducted for an additional 50 epochs. The initial learning rate is set to 0.00035, and it decays 10 times every 20 epochs. The temperature hyperparameter τ is set to 0.05. For attribute-aware refinement (AR), we set η = 2. For attribute-aware contrastive learning (AAC), we set α = 0.5. Regarding the weight hyperparameters for Linter and Ltsc, we set λinter = 0.5 and λtsc = 0.5.