reproducibilityindex.ai

STA: Spatial-Temporal Attention for Large-Scale Video-Based Person Re-Identification

Authors: Yang Fu, Xiaoyang Wang, Yunchao Wei, Thomas Huang8287-8294

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and Duke MTMCVideo Re ID.
Researcher Affiliation	Collaboration	Yang Fu,1,2 Xiaoyang Wang,2 Yunchao Wei,1 Thomas Huang1 1IFP, Beckman, UIUC, IL, 2Nokia Bell Labs, Murray Hill, NJ
Pseudocode	Yes	Algorithm 1: Algorithm of Feature Fusion Strategy
Open Source Code	No	The paper does not provide an explicit statement or link confirming that the source code for the described methodology is publicly available.
Open Datasets	Yes	Mars dataset (Zheng et al. 2016) is one of the largest video-based person re-identiﬁcation dataset.
Dataset Splits	Yes	The total 1,261 identities are split into 625 identities for training and 636 identities for testing.
Hardware Specification	Yes	Our model is implemented on Pytorch platform and trained with two NVIDIA TITAN X GPUs.
Software Dependencies	No	The paper mentions "Pytorch platform" but does not specify its version number or any other software dependencies with their versions.
Experiment Setup	Yes	we ﬁrst randomly select N = 4 frames from the input tracklet, and use the modiﬁed Res Net50 initialized on the Image Net (Deng et al. 2009) dataset as the backbone network. The number of spatial regions is set to K = 4. And, each frame is augmented by random horizontal ﬂipping and normalization. Each mini-batch is sampled with randomly selected P identities and randomly sampled K images for each identity from the training set. In our experiment, we set P = 16 and K = 4 so that the mini-batch size is 64. And, we recommend to set the margin parameter in triplet loss to 0.3. During training, we use the Adam (Kingma and Ba 2014) weight decay 0.0005 to optimize the parameters for 70 epochs. The overall learning rate is initialized to 0.0003 and decay to 3 10 5 and 3 10 6 after training for 200 and 400 epochs respectively. The total training process lasts for 800 epochs.