Text-based Person Search via Multi-Granularity Embedding Learning

Authors: Chengji Wang, Zhiming Luo, Yaojin Lin, Shaozi Li

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate the effectiveness of our method, which can achieve new stateof-the-art performance by the learned discriminative partial embeddings.
Researcher Affiliation Academia 1Department of Artificial Intelligence, Xiamen University, China 2School of Computer Science, Minnan Normal University, China
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes CUHK-PEDES dataset [Li et al., 2017b] is a large-scale text-based person search dataset that contains 40,206 images from 13,003 identities.
Dataset Splits Yes We split the dataset into three subsets: training set, validation set, and testing set. The person identities of these three subsets are disjoint. The training set includes 34,054 images and 68,126 textual descriptions of 11,003 persons. The validation set has 3,078 images and 6,158 textual descriptions of 1,000 persons. The testing set contains 3,074 images and 6,156 textual descriptions of 1,000 persons.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., specific library names like PyTorch or TensorFlow, along with their versions).
Experiment Setup Yes The vocabulary includes 12,000 words, and we represent each word by a 300-dimension vector. The feature dimension c is 512. We use Adam optimization to train the model with a learning rate of 2e 4. All the models are trained with 50 epochs and a mini-batch contains 32 image-text pairs.