Learning Attribute-Specific Representations for Visual Tracking

Authors: Yuankai Qi, Shengping Zhang, Weigang Zhang, Li Su, Qingming Huang, Ming-Hsuan Yang8835-8842

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We carry out experiments on the large challenging UAVTraffic and OTB100 datasets to demonstrate the generalization ability and the effectiveness of the proposed method against state-of-the-art tracking algorithms.
Researcher Affiliation Academia 1Harbin Institute of Technology, Weihai, China 2University of Chinese Academy of Sciences, Beijing, China 3University of California at Merced, America
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code No The paper does not provide any specific links or explicit statements about releasing their source code for the described methodology.
Open Datasets Yes We use the VOT datasets (Kristan et al. 2013; 2014; 2015) excluding the ones that appear in the OTB100 dataset as training data. (I) The object tracking benchmark dataset OTB100 (Wu, Lim, and Yang 2015)... (II) The unmanned aerial vehicle dataset for traffic, UAVTraffic (Du et al. 2018), collected by ourselves under different weather...
Dataset Splits No The paper describes how training data is organized into groups based on attributes and how positive/negative samples are cropped, but it does not specify a distinct validation dataset split for hyperparameter tuning or model selection.
Hardware Specification Yes Our unoptimized implementation runs at about one frame per second on a computer with an Intel I7-4790 CPU, 16G RAM, and a Ge Force TITAN 1080Ti GPU card.
Software Dependencies No We implement our algorithm in MATLAB, and use the Mat Conv Net toolbox (Vedaldi and Lenc 2015) to train the proposed network. While software names are mentioned, specific version numbers for Mat Conv Net or other libraries are not provided.
Experiment Setup Yes The sample parameters P, Q, M are set to 32, 96, and 256, respectively. Each branch converges after approximate 200 iterations with a fixed learning rate of the value 0.001. The learning rates of ensemble layers and the last FC layer are set to 0.001. The network converges after approximate 150 iterations. positive and negative samples are extracted from the starting frame, where they have 0.7 and 0.3 Io U overlap ratios with the ground truth bounding box, respectively.