Relation-Guided Spatial Attention and Temporal Refinement for Video-Based Person Re-Identification
Authors: Xingze Li, Wengang Zhou, Yun Zhou, Houqiang Li11434-11441
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four prevalent benchmarks verify the state-of-the-art performance of the proposed method. |
| Researcher Affiliation | Academia | Xingze Li, Wengang Zhou, Yun Zhou, Houqiang Li CAS Key Laboratory of Technology in GIPAS, EEIS Department, University of Science and Technology of China lixingze@mail.ustc.edu.cn, {zhwg, zhouyun, lihq}@ustc.edu.cn |
| Pseudocode | No | The paper describes methods using diagrams and equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statement about releasing code or a link to a source code repository. |
| Open Datasets | Yes | MARS (Zheng et al. 2016) is one of the largest video-based person re-identification benchmark... Duke MTMC-Video Re ID (Wu et al. 2018b) is another large video-based person re-identification benchmark... i LIDS-VID (Wang et al. 2014) and PRID-2011 (Hirzer et al. 2011) are two small benchmarks. |
| Dataset Splits | Yes | For the MARS and Duke MTMC-Video Re ID datasets, we adopt the widely used training/testing splits provided by (Zheng et al. 2016) and (Wu et al. 2018b). For the i LIDS-VID and PRID2011 datasets, we randomly split the identities equally into the training set and testing set. |
| Hardware Specification | Yes | Our model is implemented by Pytorch and optimized using four NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | Our model is implemented by Pytorch and optimized using four NVIDIA Tesla V100 GPUs. |
| Experiment Setup | Yes | In the training phase, we randomly select T frames from a variable-length sequence to form a fixed-length input clip. Each batch consists of P identities and K input clips for each identity. In all our experiments, we select P = 18 and K = 4, therefore, the batch size is 72T. All images are resized to 256 128, and randomly horizontal flipped. Random erasing (Zhong et al. 2017) is also used as data augmentation. We use the Res Net50 (He et al. 2016) pretrained on the Image Net (Deng et al. 2009) dataset as backbone network. The last pooling layer and fully connected layer are removed and the stride in the last down-sampling in the conv5 x block is set to 1. The model is optimized using Adam (Kingma and Ba 2014) with weight decay 5 10 4. The initial learning rate is 3 10 4 and it is reduced to 3 10 5 and 3 10 6 after training 125 and 250 epochs. The model is trained for 375 epochs in total. |