Text-Based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning

Authors: Xinyi Wu, Wentao Ma, Dan Guo, Tongqing Zhou, Shan Zhao, Zhiping Cai

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our method exhibits superior performance.
Researcher Affiliation Academia 1College of Computer, National University of Defense Technology, Changsha, China 2School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, China 3School of Computer Science and Information Engineering, He Fei University of Technology, Hefei, China
Pseudocode No The paper mentions 'The detailed occlusion generation algorithm is described in Appendix ( B).' but does not include any clearly labeled pseudocode or algorithm blocks in the main text provided.
Open Source Code Yes The source code is available at https://github.com/littlexinyi/MGCC.
Open Datasets Yes We construct three occluded datasets via OGor, called Occluded-CUHK-PEDES, Occluded-ICFG-PEDES, and Occluded-RSTPReid, based on three existing T-Re ID datasets. Different from existing Random Erase (Zhong et al. 2020) and Random Cropping (Chen et al. 2021a) method with weak generation ability while facing diversified occlusions, our OGor adopts an occlusion sample augmentation strategy with realistic occlusion scenario simulation, which mainly contains the following two steps: ... to the existing three datasets of T-Re ID (Li et al. 2017; Ding et al. 2021; Zhu et al. 2021) to construct new occluded datasets for TORe ID.
Dataset Splits Yes To generate occluded images, we randomly select 30% of the whole train, val, and test images within the same ID but different views from the T-Re ID dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions models and tools like Mask R-CNN and CLIP but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Training and Inference Similarity calculation. For a given pair S(Ik, Tk) in B = {(Ik, Tk)}N k=1, the final similarity which contains multigranularity contrastive scores can be described as follows: S(Ik, Tk) = (S P W + SI T + S P T + S I W )/4. Objective loss function. The Info NCE loss function is utilized to pull the positive instances and push away the hard negative ones in a batch of B image-text pairs. ... where the τ is the temperature hyper-parameter of softmax and the denotes all content in the dimension. ... Zi = ρi n and Zt = ρt m, where ρi and ρt represent the token selection ratio for images and texts, respectively. Detailed ablation studies of ρi and ρt are presented in Appendix ( F).