Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-identification

Authors: Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, Zheng-Jun Zha1034-1042

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on two challenging benchmarks demonstrate superior performance of MID over stateof-the-art methods.
Researcher Affiliation Academia 1 University of Science and Technology of China 2 Institute of Computing Technology, Chinese Academy of Sciences {hzp1104,zkcys001}@mail.ustc.edu.cn, {jwliu6,zhazj}@ustc.edu.cn, liang.li@ict.ac.cn
Pseudocode No The paper describes the proposed methods mathematically and descriptively but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any statements about releasing open-source code or provide links to a code repository.
Open Datasets Yes We evaluate the proposed MID using two public RGB-Infrared datasets: Reg DB (Nguyen et al. 2017) and SYSU-MM01 (Wu et al. 2017).
Dataset Splits Yes Reg DB dataset contains 412 pedestrians. Each pedestrian has 10 visible images and 10 thermal images. Following the evaluation protocol (Ye et al. 2018a,b), this dataset is randomly split into two parts, 206 identities for training and the other 206 identities for testing, with two different testing modes, i.e., visible to thermal mode and thermal to visible mode. The reported results are the average of 10 random training/test splits on Reg DB dataset. SYSU-MM01 (Wu et al. 2017) is the largest existing RGB-infrared dataset, which was captured with 4 visible and 2 infrared cameras. The training set contains 395 persons with 22,258 RGB images and 11,909 IR images, while the testing set contains 96 persons with 3,803 IR images and 301 RGB images.
Hardware Specification Yes The proposed method is implemented by the Py Torch framework with one NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions 'Py Torch framework' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes Each mini-batch contains 96 images of 8 identities (each person has 4 RGB images, 4 IR images, and 4 generated mixed modality images). Res Net-50 (He et al. 2016) model is adopted as the backbone network. Part-pooling (Sun et al. 2018) is added after the backbone. The first three residual blocks of Res Net-50 model are equipped with modality-adaptive convolution decomposition. The stride of the last convolution layer is set to 1. The margin ρ is set to 0.3. The parameter µ and ξ are set to 1 and 0.1, respectively. The trade-off parameters λ1,4,5 are set to 1, λ2,3 are set to 0.5, and λ6 is set to 0.1 in Eq. (8). We adopt Adam Optimizer to train the actor-critic agent. And we utilize the stochastic gradient descent (SGD) optimizer for MACD with the momentum of 0.9, the initial learning rate of 0.05, 0.02 on Reg DB and SYSU-MM01 datasets, respectively. The learning rates decayed by 0.1 after 20 and 45 epochs. The whole MID framework is trained for 60 epochs on Reg DB dataset which takes 1 hour, and for 100 epochs on SYSU-MM01 dataset which takes 6 hours.