Appearance and Motion Enhancement for Video-Based Person Re-Identification

Authors: Shuzhao Li, Huimin Yu, Haoji Hu11394-11401

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on three popular video-based person Re ID benchmarks demonstrate the effectiveness of our proposed model and the state-of-the-art performance compared with existing methods.
Researcher Affiliation Academia Shuzhao Li,1 Huimin Yu,1,2 Haoji Hu1 1College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China 2The State Key Laboratory of CAD and CG, Zhejiang University, Hangzhou, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes MARS (Zheng et al. 2016) is currently one of the largest video-based person re-identification datasets, which consists of 1261 identities and around 20000 human walking sequences. i LIDS-VID (Wang et al. 2014) is composed of 600 sequences belonging to 300 different pedestrians from two non-overlapping cameras. PRID-2011 (Hirzer et al. 2011) consists of 749 different identities from one camera, and 385 identities from the other, with only the first 200 people appear in both cameras.
Dataset Splits Yes We follow the original splits provided by MARS, and for i LIDS-VID and PRID-2011, we follow the evaluation protocol from previous works (Wang et al. 2014) where the dataset is randomly split into the train/test set for 10 times, then the averaged accuracies are reported.
Hardware Specification Yes We implement our proposed algorithm based on Py Torch framework on two GTX 1080Ti GPUs with 11GB memory.
Software Dependencies No We implement our proposed algorithm based on Py Torch framework on two GTX 1080Ti GPUs with 11GB memory. (No version specified for PyTorch or other libraries).
Experiment Setup Yes The initial learning rate is set to 1e-3, and decreased by 0.2 every 60 epochs. The weight decay is set to 5e-4. The length of the input sequence T is empirically set to 8. The input frames are resized to 256 128. The sizes of the feature maps in our model are set to H = 16, W = 8, C = 1024, T = 3, and the dimension of the final feature fs is set to 512. The hyperparameters k, λA, λM are set to 0.2, 0.1, 10 respectively.