Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification
Authors: Zi Wang, Chenglong Li, Aihua Zheng, Ran He, Jin Tang2633-2641
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Superior performance on multimodal Re-ID dataset RGBNT201 and three constructed Re ID datasets validate the effectiveness of the proposed method compared with the state-of-the-art approaches. Experiments We evaluate the proposed method IEEE on the benchmark multi-modal person Re-ID datasets RGBNT201 (Zheng et al. 2021) and constructed multi-modal dataset based on Market1501 (Zheng et al. 2015), comparing to state-of-the-art methods. Ablation Study Our method consists of three key components, cross-modal interacting module (CIM), relation-based embedding module (REM) and multi-modal margin loss (3M loss). To evaluate the contribution of each component in our model, we conduct an ablation experiment on RGBNT201 by progressively introducing each component as shown in Table 3. |
| Researcher Affiliation | Academia | Zi Wang3, Chenglong Li1,2,4, Aihua Zheng1,2,4, , Ran He5, Jin Tang1,2,3 1Information Materials and Intelligent Sensing Laboratory of Anhui Province 2Anhui Provincial Key Laboratory of Multimodal Cognitive Computation 3School of Computer Science and Technology, Anhui University 4School of Artificial Intelligence, Anhui University 5NLPR, CRIPAC, Institute of Automation, Chinese Academy of Sciences {ziwang1121, lcl1314, ahzheng214}@foxmail.com, rhe@nlpr.ia.ac.cn, tangjin@ahu.edu.cn |
| Pseudocode | No | The paper describes the proposed modules and their operations in detail but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a statement about releasing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | RGBNT201 (Zheng et al. 2021) is the first multi-modal person Re-ID dataset. Market1501 (Zheng et al. 2015) is a scalable RGB single-modal person Re-ID dataset. |
| Dataset Splits | Yes | RGBNT201 (Zheng et al. 2021) is the first multi-modal person Re-ID dataset. It contains 4787 image triplets of 201 persons, while 141 identities for training, 30 identities for validation, and 30 identities for testing. |
| Hardware Specification | Yes | The implementation platform is Pytorch with a Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions 'Pytorch' and 'Image Net' but does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | The original learning rate is set as 0.001, and we reduce the learning rate by 10 times in epoch 20 and epoch 40. The number of mini-batches is 8. The feature maps after CIM module are equally split into 6 stripes. The dimension of each part feature is reduced to 128 by the FC layer. Thus, the feature dimension of each modality (f embed) is 6 128 = 768, and the final feature (ffinal) of the individual person is 768 3 = 2304-dim. Both cross-entropy loss and multi-modal margin loss are used in training phase, we set the margin in multi-modal margin loss to 1, and the δ in final loss to 1. We use Stochastic Gradient Descent (SGD) with the momentum of 0.9 and weight decay of 0.0005 to fine-tune the network. |