Frame-Guided Region-Aligned Representation for Video Person Re-Identification

Authors: Zengqun Chen, Zhiheng Zhou, Junchu Huang, Pengyu Zhang, Bo Li10591-10598

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted on benchmark datasets to demonstrate the effectiveness of the proposed method to solve the misalignment problem and the superiority of the proposed method to the existing video-based person re-identification methods.
Researcher Affiliation Academia Zengqun Chen, Zhiheng Zhou, Junchu Huang, Pengyu Zhang, Bo Li South China University of Technology, Guangzhou, China {eechenzq, eehjc, eezhangpy}@mail.scut.edu.cn, {zhouzh, leebo}@scut.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. The methods are described in narrative text with mathematical formulations.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There is no mention of code release, repository links, or code in supplementary materials.
Open Datasets Yes We evaluate our approach on three well-known datasets, including i LIDS-VID (Wang et al. 2014), PRID-2011 (Hirzer et al. 2011) and MARS (Zheng et al. 2016).
Dataset Splits No For i LIDS-VID and PRID2011 datasets, we follow the implementation in previous works (Wang et al. 2014) and randomly split the datasets into 50% of persons for training and 50 % of persons for testing. The MARS dataset provides fixed training and testing sets, which contain predefined 8298 sequences of 625 persons for training and 12180 sequences of 636 persons for testing, including 3248 low-quality sequences in the gallery set. The paper specifies training and testing splits, but does not explicitly mention a separate validation split.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or any other computer specifications used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup Yes The input size of each frame is set as 256 128 pixels, randomly cropped from a scaled image whose size is enlarged by 1/8. We then apply the image-level augmentation to the whole sequence, including mirroring, normalization and randomly erasing (Zhong et al. 2017b). In order to train hard mining triplet loss, 16 identities with 4 tracklets each person are taken in a minibatch so that the mini-batch size is 64. The number of spatial regions Ns in the local branch is set to 4. For better optimization of our model, we recommend to set the margin parameter in triplet loss to 0.5 and set the coefficient associated with center loss to 5e-4. For network parameter training, we adopt Adam with a weight decay of 0.0005. The model is trained for 300 epochs in total, starting with a learning rate of 0.03 for parameters in the center loss and 0.0003 for others. The learning rate is reduced ten times after every 100 epochs.