STA: Spatial-Temporal Attention for Large-Scale Video-Based Person Re-Identification

Authors: Yang Fu, Xiaoyang Wang, Yunchao Wei, Thomas Huang8287-8294

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and Duke MTMCVideo Re ID.
Researcher Affiliation Collaboration Yang Fu,1,2 Xiaoyang Wang,2 Yunchao Wei,1 Thomas Huang1 1IFP, Beckman, UIUC, IL, 2Nokia Bell Labs, Murray Hill, NJ
Pseudocode Yes Algorithm 1: Algorithm of Feature Fusion Strategy
Open Source Code No The paper does not provide an explicit statement or link confirming that the source code for the described methodology is publicly available.
Open Datasets Yes Mars dataset (Zheng et al. 2016) is one of the largest video-based person re-identification dataset.
Dataset Splits Yes The total 1,261 identities are split into 625 identities for training and 636 identities for testing.
Hardware Specification Yes Our model is implemented on Pytorch platform and trained with two NVIDIA TITAN X GPUs.
Software Dependencies No The paper mentions "Pytorch platform" but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes we first randomly select N = 4 frames from the input tracklet, and use the modified Res Net50 initialized on the Image Net (Deng et al. 2009) dataset as the backbone network. The number of spatial regions is set to K = 4. And, each frame is augmented by random horizontal flipping and normalization. Each mini-batch is sampled with randomly selected P identities and randomly sampled K images for each identity from the training set. In our experiment, we set P = 16 and K = 4 so that the mini-batch size is 64. And, we recommend to set the margin parameter in triplet loss to 0.3. During training, we use the Adam (Kingma and Ba 2014) weight decay 0.0005 to optimize the parameters for 70 epochs. The overall learning rate is initialized to 0.0003 and decay to 3 10 5 and 3 10 6 after training for 200 and 400 epochs respectively. The total training process lasts for 800 epochs.