TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification

Authors: Shengcai Liao, Ling Shao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed method, called Trans Matcher, achieves state-of-the-art performance in generalizable person re-identification, with up to 6.1% and 5.7% performance gains in Rank-1 and m AP, respectively, on several popular datasets. Code is available at https://github.com/Shengcai Liao/QAConv.
Researcher Affiliation Industry Shengcai Liao and Ling Shao Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, UAE
Pseudocode No The paper describes the method and illustrates it with a block diagram (Figure 1), but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Shengcai Liao/QAConv.
Open Datasets Yes Four large-scale person re-identification datasets, CUHK03 [8], Market-1501 [34], MSMT17 [28], and Rand Person [27], which are publicly available for research purpose, are used in our experiments.
Dataset Splits No The paper describes training and testing splits for each dataset (e.g., CUHK03: '767 and 700 subjects used for training and testing, respectively'), but does not explicitly detail a separate validation data split.
Hardware Specification Yes All experiments are run on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions 'Py Torch' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The input image is resized to 384 128. The batch size is set to 64, with K=4 for the GS sampler. The network is trained with the SGD optimizer, with a learning rate of 0.0005 for the backbone network, and 0.005 for newly added layers. They are decayed by 0.1 after 10 epochs, and 15 epochs are trained in total. ... Gradient clipping is applied with T = 4. Several commonly used data augmentation methods are applied, including random flipping, cropping, occlusion, and color jittering. For the proposed Trans Matcher, unless otherwise indicated, d=512 and D=2048 by default as in the original Transformer [24], and H=1 and N=3 for higher efficiency.