Discriminative and Robust Online Learning for Siamese Visual Tracking

Authors: Jinghao Zhou, Peng Wang, Haoyang Sun13017-13024

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments4.2 Comparison with state-of-the-art4.3 Ablation StudyTable 1: state-of-the-art comparison on two popular tracking benchmarks OTB2015 and VOT2018 with their running speed.
Researcher Affiliation Academia Jinghao Zhou, Peng Wang, Haoyang Sun School of Computer Science and School of Automation, Northwestern Polytechnical University, China National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, China jensen.zhoujh@gmail.com, peng.wang@nwpu.edu.cn, sunhaoyang@mail.nwpu.edu.cn
Pseudocode Yes Algorithm 1: Tracking algorithm
Open Source Code Yes Our method is implemented in Python with Py Torch, and the complete code and video demo will be made available at https://github.com/shallowtoil/DROL.
Open Datasets Yes OTB100, VOT2018, VOT2018-LT, UAV123, Tracking Net, and La SOT.OTB2015 (Wu, Lim, and Yang 2015)VOT2018 (Kristan et al. 2018)
Dataset Splits Yes The above hyper-parameters are set using VOT2018 as the validation set and are further evaluated in Section 5.
Hardware Specification Yes The speed is tested on Nvidia GTX 1080Ti GPU.
Software Dependencies No Our method is implemented in Python with Py Torch, and the complete code and video demo will be made available at https://github.com/shallowtoil/DROL. The paper mentions Python and PyTorch but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For the classification subnet, the first layer is a 1 1 convolutional layer with Re LU activation, which reduces the feature dimensionality to 64. The last layer employs a 4 4 kernel with a single output channel. (...) For online tuning, we use the region of size 255 255 of the first frame to pre-train the whole classifier. (...) The classifier is updated every 10 frame with a learning rate set to 0.01 and doubled once neighboured distractors are detected. To fuse classification scores, we set λ to 0.6 in DROL-FC and 0.8 in DROL-RPN and DROL-Mask. (...) we update the short-term template every T = 5 frames, while τc, υr, and υc are set to 0.75, 0.6, and 0.5 respectively.