Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search

Authors: Ya Jing, Chenyang Si, Junbo Wang, Wei Wang, Liang Wang, Tieniu Tan11189-11196

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our model, we perform extensive experiments on the CUHK Person Description Dataset (CUHK-PEDES) which is currently the only available dataset for text-based person search. Experimental results show that our approach outperforms the state-of-the-art methods by 15 % in terms of the top-1 metric. Experimental results show that our PMA outperforms the state-of-the-art methods on this dataset. Extensive ablation studies verify the effectiveness of each component in the PMA.
Researcher Affiliation Academia 1Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR) 2Center for Excellence in Brain Science and Intelligence Technology (CEBSIT), Institute of Automation, Chinese Academy of Sciences (CASIA) 3University of Chinese Academy of Sciences (UCAS)
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing code for the described methodology or a link to a code repository.
Open Datasets Yes The CUHK-PEDES is currently the only dataset for text-based person search. We follow the same data split as (Li et al. 2017b).
Dataset Splits Yes The training set has 34054 images, 11003 persons and 68126 textual descriptions. The validation set has 3078 images, 1000 persons and 6158 textual descriptions. The test set has 3074 images, 1000 persons and 6156 textual descriptions.
Hardware Specification Yes This work is also supported by grants from NVIDIA and the NVIDIA DGX-1 AI Supercomputer.
Software Dependencies No The paper mentions software components like "VGG-16", "ResNet50", "bi-LSTM", "NLTK", and "Adam optimizer", but does not specify their version numbers.
Experiment Setup Yes In our experiments, we set both the hidden dimension of bi-LSTM and dimension b of the feature space as 1024. For pose CNN, the kernel size of each convolutional layer is 3x3 and the numbers of the convolutional channels are 64, 128, 256 and 256, respectively. The fully connected layer has 1024 nodes. First, we fix the parameters of pre-trained visual CNN and only train the other model parameters with a learning rate of 1e-3. Second, we release the parts of the visual CNN and train the entire model with a learning rate of 2e-4. The model is optimized with the Adam (Kingma and Ba 2014) optimizer. The batch size and margin are 128 and 0.2, respectively.