Visual Tracking via Hierarchical Deep Reinforcement Learning

Authors: Dawei Zhang, Zhonglong Zheng, Riheng Jia, Minglu Li3315-3323

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the GOT10k, OTB-100, UAV-123, VOT and La SOT tracking benchmarks, demonstrate that the proposed tracker achieves stateof-the-art performance while running in real-time.
Researcher Affiliation Academia 1 College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, China 2 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Pseudocode Yes Algorithm 1: Hierarchical Decision Tracking Input: Frame {It}T 1 , initial bounding box b1 Output: Optimal target location {bt}T 2 1 for t = 2 : T do 2 Obatin initial state st 1 according to bt 1 ; 3 Actor agent selects the action a ; 4 Obtain the state st 1,k ; 5 Policy agent selects the action p ; 6 while p == search do 7 Actor agent selects the action a ; 8 k k + 1 ; 9 Obtain the state st 1,k+1 ; 10 Policy agent selects action p ; 12 if p {update, reinit} then 13 Conduct expert demonstration be ; 15 Obtain the optimal bounding box bt.
Open Source Code No The paper does not contain any explicit statements about making the source code available or provide a link to a code repository for the methodology described.
Open Datasets Yes The proposed tracker is trained on the training set of the GOT-10k (Huang, Zhao, and Huang 2019), which is a large-scale dataset including 9335 training sequences, 180 validation and other 180 testing videos for evaluation.
Dataset Splits Yes The proposed tracker is trained on the training set of the GOT-10k (Huang, Zhao, and Huang 2019), which is a large-scale dataset including 9335 training sequences, 180 validation and other 180 testing videos for evaluation.
Hardware Specification Yes Our tracker is implemented in Python with the Pytorch 1.2 framework, which runs about at 40 fps on a PC with Intel(R) Xeon(R) CPU E5-2683 @2.10 GHz with 64G RAM and a NVIDIA Ge Force GTX 2080 Ti GPU.
Software Dependencies Yes Our tracker is implemented in Python with the Pytorch 1.2 framework
Experiment Setup Yes Implementation Details Training. We initialize our backbone network with the parameters pre-trained on Image Net (Russakovsky et al. 2015). The proposed tracker is trained on the training set of the GOT-10k (Huang, Zhao, and Huang 2019)... We crop the image patch within the bounding box scaled by µ = 1.5 and resize it to (128 128 3) for fitting the input size of the network. During training, four GPUs are used, and a total number of M = 12 training agents is set. The discount factor γ is set to 1. In each iteration of the reinforcement learning, we randomly select the sequence of length L = 5 for the tracking simulation. We apply the Adam optimizer to train the model for 40000 episodes until convergence. The learning rate of both agents is set to 10 6, while the weight decay coefficient is set to 10 4. Tracking. For inference, our model can directly perform robust tracking without any online updating of the network. To ensure tracking efficiency, the maximal number k of policy s actions is set to 3 for each frame. For the action update and reinit, we select the demonstration of an expert tracker as the tracking result. To this end, the role of expert tracker is assigned to Siam RPN++ (Li et al. 2019) or Di MP-50 (Bhat et al. 2019).