SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

Authors: Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, Gang Yu12549-12556

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive analysis and ablation studies demonstrate the effectiveness of our proposed guidelines. Without bells and whistles, our Siam FC++ tracker achieves state-of-the-art performance on five challenging benchmarks(OTB2015, VOT2018, La SOT, GOT-10k, Tracking Net), which proves both the tracking and generalization ability of the tracker.
Researcher Affiliation Collaboration Yinda Xu,1 Zeyu Wang,2 Zuoxin Li,2 Ye Yuan,2 Gang Yu2 1College of Electrical Engineering, Zhejiang University 2Megvii Inc.
Pseudocode No The paper describes the architecture and processes of Siam FC++ but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes We will release the code to facilitate further researches.
Open Datasets Yes We adopt ILSVRC-VID/DET (Russakovsky et al. 2015), COCO (Lin et al. 2014) , Youtube BB (Real et al. 2017), La SOT (Fan et al. 2019) and GOT-10k (Huang, Zhao, and Huang 2018) as our basic training set.
Dataset Splits Yes On GOT-10k val subset, we obtain an AO of 77.8 for the tracker predicting PSS and an AO of 78.0 for the tracker predicting Io U.
Hardware Specification Yes The proposed tracker with Alex Net backbone runs at 160 FPS on the VOT2018 short-term benchmark, while the one with Google Net backbone runs at about 90 FPS on the VOT2018 short-term benchmark, both evaluated on an NVIDIA RTX 2080Ti GPU.
Software Dependencies No The paper mentions software like PyTorch (implied by typical deep learning setups) and ImageNet pretraining, but it does not list any specific software dependencies with version numbers.
Experiment Setup Yes We first train our model with for 5 warm up epochs with learning rate linearly increased from 10 7 to 2 10 3, then use a cosine annealing learning rate schedule for the rest of 45 epochs, with 600k image pairs for each epoch. We choose stochastic gradient descent (SGD) with a momentum of 0.9 as the optimizer.