STA: Spatial-Temporal Attention for Large-Scale Video-Based Person Re-Identification
Authors: Yang Fu, Xiaoyang Wang, Yunchao Wei, Thomas Huang8287-8294
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and Duke MTMCVideo Re ID. |
| Researcher Affiliation | Collaboration | Yang Fu,1,2 Xiaoyang Wang,2 Yunchao Wei,1 Thomas Huang1 1IFP, Beckman, UIUC, IL, 2Nokia Bell Labs, Murray Hill, NJ |
| Pseudocode | Yes | Algorithm 1: Algorithm of Feature Fusion Strategy |
| Open Source Code | No | The paper does not provide an explicit statement or link confirming that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Mars dataset (Zheng et al. 2016) is one of the largest video-based person re-identification dataset. |
| Dataset Splits | Yes | The total 1,261 identities are split into 625 identities for training and 636 identities for testing. |
| Hardware Specification | Yes | Our model is implemented on Pytorch platform and trained with two NVIDIA TITAN X GPUs. |
| Software Dependencies | No | The paper mentions "Pytorch platform" but does not specify its version number or any other software dependencies with their versions. |
| Experiment Setup | Yes | we first randomly select N = 4 frames from the input tracklet, and use the modified Res Net50 initialized on the Image Net (Deng et al. 2009) dataset as the backbone network. The number of spatial regions is set to K = 4. And, each frame is augmented by random horizontal flipping and normalization. Each mini-batch is sampled with randomly selected P identities and randomly sampled K images for each identity from the training set. In our experiment, we set P = 16 and K = 4 so that the mini-batch size is 64. And, we recommend to set the margin parameter in triplet loss to 0.3. During training, we use the Adam (Kingma and Ba 2014) weight decay 0.0005 to optimize the parameters for 70 epochs. The overall learning rate is initialized to 0.0003 and decay to 3 10 5 and 3 10 6 after training for 200 and 400 epochs respectively. The total training process lasts for 800 epochs. |