reproducibilityindex.ai

Relation-Guided Spatial Attention and Temporal Refinement for Video-Based Person Re-Identification

Authors: Xingze Li, Wengang Zhou, Yun Zhou, Houqiang Li11434-11441

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on four prevalent benchmarks verify the state-of-the-art performance of the proposed method.
Researcher Affiliation	Academia	Xingze Li, Wengang Zhou, Yun Zhou, Houqiang Li CAS Key Laboratory of Technology in GIPAS, EEIS Department, University of Science and Technology of China lixingze@mail.ustc.edu.cn, {zhwg, zhouyun, lihq}@ustc.edu.cn
Pseudocode	No	The paper describes methods using diagrams and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include any explicit statement about releasing code or a link to a source code repository.
Open Datasets	Yes	MARS (Zheng et al. 2016) is one of the largest video-based person re-identiﬁcation benchmark... Duke MTMC-Video Re ID (Wu et al. 2018b) is another large video-based person re-identiﬁcation benchmark... i LIDS-VID (Wang et al. 2014) and PRID-2011 (Hirzer et al. 2011) are two small benchmarks.
Dataset Splits	Yes	For the MARS and Duke MTMC-Video Re ID datasets, we adopt the widely used training/testing splits provided by (Zheng et al. 2016) and (Wu et al. 2018b). For the i LIDS-VID and PRID2011 datasets, we randomly split the identities equally into the training set and testing set.
Hardware Specification	Yes	Our model is implemented by Pytorch and optimized using four NVIDIA Tesla V100 GPUs.
Software Dependencies	No	Our model is implemented by Pytorch and optimized using four NVIDIA Tesla V100 GPUs.
Experiment Setup	Yes	In the training phase, we randomly select T frames from a variable-length sequence to form a ﬁxed-length input clip. Each batch consists of P identities and K input clips for each identity. In all our experiments, we select P = 18 and K = 4, therefore, the batch size is 72T. All images are resized to 256 128, and randomly horizontal ﬂipped. Random erasing (Zhong et al. 2017) is also used as data augmentation. We use the Res Net50 (He et al. 2016) pretrained on the Image Net (Deng et al. 2009) dataset as backbone network. The last pooling layer and fully connected layer are removed and the stride in the last down-sampling in the conv5 x block is set to 1. The model is optimized using Adam (Kingma and Ba 2014) with weight decay 5 10 4. The initial learning rate is 3 10 4 and it is reduced to 3 10 5 and 3 10 6 after training 125 and 250 epochs. The model is trained for 375 epochs in total.