Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification

Authors: Chenyang Yu, Xuehu Liu, Jiawen Zhu, Yuhao Wang, Pingping Zhang, Huchuan Lu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method outperforms other state-of-the-art methods on both image and video person Re ID benchmarks. Experiments conducted on three video-based and two image-based person Re ID benchmarks clearly demonstrate the effectiveness of our methods.
Researcher Affiliation Academia 1 School of Information and Communication Engineering, Dalian University of Technology, Dalian, China 2 School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 3 School of Future Technology, School of Artificial Intelligence, Dalian University of Technology, Dalian, China EMAIL;EMAIL;EMAIL
Pseudocode No The paper describes methods like Multi-Memory Collaboration (MMC) and Multi-Temporal Mamba (MTM) using prose and mathematical equations (e.g., Eq. 1-11) and diagrams (e.g., Fig. 2, 3), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/AsuradaYuci/CLIMB-ReID
Open Datasets Yes We evaluate our approach on three video-based person Re ID benchmarks, including MARS (Zheng et al. 2016), LSVID (Li et al. 2019) and i LIDS-VID (Wang et al. 2014). The proposed method is also validated on two image-based person Re ID datasets, i.e., Market1501 (Zheng et al. 2015) and MSMT17 (Wei et al. 2018).
Dataset Splits Yes We evaluate our approach on three video-based person Re ID benchmarks, including MARS (Zheng et al. 2016), LSVID (Li et al. 2019) and i LIDS-VID (Wang et al. 2014). The proposed method is also validated on two image-based person Re ID datasets, i.e., Market1501 (Zheng et al. 2015) and MSMT17 (Wei et al. 2018). More details of these datasets can be found in the Supplementary. Following common practices, the Cumulative Matching Characteristic (CMC)@K(K = 1, 5) and mean Average Precision (m AP) are adopted to measure the performance.
Hardware Specification No The paper does not explicitly state the specific hardware used for running its experiments, such as GPU models, CPU models, or other detailed computer specifications.
Software Dependencies No The paper mentions using "Vi T-B/16 from CLIP (Radford et al. 2021)" as a feature encoder but does not provide specific version numbers for software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We use the Vi T-B/16 from CLIP (Radford et al. 2021) as the feature encoder, which contains 12 Transformer layers with the hidden size of 768. For MMC, we use the Mean and Hard selection strategies to generate P = 2 memories. We set ยต to 0.2. During training, we adopt random flipping, random cropping and random erasing (Zhong et al. 2020) for data augmentation. Each frame is resized to 256 128. We train the framework for 60 epochs in total. For video-based person Re ID, the mini-batch size is 128, consisting of 4 identities, 4 tracklets for each identity and 8 frames from each tracklet. We utilize the Adam optimizer with the learning rate of 5 10 6. We warm up the model with 10 epochs, linearly increasing the learning rate from 5 10 7 to 5 10 6. Afterwards, the learning rate is reduced by a factor of 0.1 at the 30th and 50th epochs. The slice stride in MTM is set to be S = [1, 4, 8]. For image-based person Re ID, the mini-batch size is 128, consisting of 16 identities and 8 images for each identity. We utilize the SGD optimizer with the learning rate of 3.5 10 4 and the weight decay of 5 10 4. Similarly, we also warm up the model with 10 epochs, linearly increasing the learning rate from 3.5 10 5 to 3.5 10 4. The cosine distance is employed as the distance metric for ranking.