Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search
Authors: Ya Jing, Chenyang Si, Junbo Wang, Wei Wang, Liang Wang, Tieniu Tan11189-11196
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of our model, we perform extensive experiments on the CUHK Person Description Dataset (CUHK-PEDES) which is currently the only available dataset for text-based person search. Experimental results show that our approach outperforms the state-of-the-art methods by 15 % in terms of the top-1 metric. Experimental results show that our PMA outperforms the state-of-the-art methods on this dataset. Extensive ablation studies verify the effectiveness of each component in the PMA. |
| Researcher Affiliation | Academia | 1Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR) 2Center for Excellence in Brain Science and Intelligence Technology (CEBSIT), Institute of Automation, Chinese Academy of Sciences (CASIA) 3University of Chinese Academy of Sciences (UCAS) |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | The CUHK-PEDES is currently the only dataset for text-based person search. We follow the same data split as (Li et al. 2017b). |
| Dataset Splits | Yes | The training set has 34054 images, 11003 persons and 68126 textual descriptions. The validation set has 3078 images, 1000 persons and 6158 textual descriptions. The test set has 3074 images, 1000 persons and 6156 textual descriptions. |
| Hardware Specification | Yes | This work is also supported by grants from NVIDIA and the NVIDIA DGX-1 AI Supercomputer. |
| Software Dependencies | No | The paper mentions software components like "VGG-16", "ResNet50", "bi-LSTM", "NLTK", and "Adam optimizer", but does not specify their version numbers. |
| Experiment Setup | Yes | In our experiments, we set both the hidden dimension of bi-LSTM and dimension b of the feature space as 1024. For pose CNN, the kernel size of each convolutional layer is 3x3 and the numbers of the convolutional channels are 64, 128, 256 and 256, respectively. The fully connected layer has 1024 nodes. First, we fix the parameters of pre-trained visual CNN and only train the other model parameters with a learning rate of 1e-3. Second, we release the parts of the visual CNN and train the entire model with a learning rate of 2e-4. The model is optimized with the Adam (Kingma and Ba 2014) optimizer. The batch size and margin are 128 and 0.2, respectively. |