reproducibilityindex.ai

Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification

Authors: Zi Wang, Chenglong Li, Aihua Zheng, Ran He, Jin Tang2633-2641

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Superior performance on multimodal Re-ID dataset RGBNT201 and three constructed Re ID datasets validate the effectiveness of the proposed method compared with the state-of-the-art approaches. Experiments We evaluate the proposed method IEEE on the benchmark multi-modal person Re-ID datasets RGBNT201 (Zheng et al. 2021) and constructed multi-modal dataset based on Market1501 (Zheng et al. 2015), comparing to state-of-the-art methods. Ablation Study Our method consists of three key components, cross-modal interacting module (CIM), relation-based embedding module (REM) and multi-modal margin loss (3M loss). To evaluate the contribution of each component in our model, we conduct an ablation experiment on RGBNT201 by progressively introducing each component as shown in Table 3.
Researcher Affiliation	Academia	Zi Wang3, Chenglong Li1,2,4, Aihua Zheng1,2,4, , Ran He5, Jin Tang1,2,3 1Information Materials and Intelligent Sensing Laboratory of Anhui Province 2Anhui Provincial Key Laboratory of Multimodal Cognitive Computation 3School of Computer Science and Technology, Anhui University 4School of Artificial Intelligence, Anhui University 5NLPR, CRIPAC, Institute of Automation, Chinese Academy of Sciences {ziwang1121, lcl1314, ahzheng214}@foxmail.com, rhe@nlpr.ia.ac.cn, tangjin@ahu.edu.cn
Pseudocode	No	The paper describes the proposed modules and their operations in detail but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets	Yes	RGBNT201 (Zheng et al. 2021) is the first multi-modal person Re-ID dataset. Market1501 (Zheng et al. 2015) is a scalable RGB single-modal person Re-ID dataset.
Dataset Splits	Yes	RGBNT201 (Zheng et al. 2021) is the first multi-modal person Re-ID dataset. It contains 4787 image triplets of 201 persons, while 141 identities for training, 30 identities for validation, and 30 identities for testing.
Hardware Specification	Yes	The implementation platform is Pytorch with a Ge Force RTX 3090 GPU.
Software Dependencies	No	The paper mentions 'Pytorch' and 'Image Net' but does not specify version numbers for these software components or any other libraries.
Experiment Setup	Yes	The original learning rate is set as 0.001, and we reduce the learning rate by 10 times in epoch 20 and epoch 40. The number of mini-batches is 8. The feature maps after CIM module are equally split into 6 stripes. The dimension of each part feature is reduced to 128 by the FC layer. Thus, the feature dimension of each modality (f embed) is 6 128 = 768, and the final feature (ffinal) of the individual person is 768 3 = 2304-dim. Both cross-entropy loss and multi-modal margin loss are used in training phase, we set the margin in multi-modal margin loss to 1, and the δ in final loss to 1. We use Stochastic Gradient Descent (SGD) with the momentum of 0.9 and weight decay of 0.0005 to fine-tune the network.