Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

Authors: Yongxin Li, Mengyuan Liu, You Wu, Xucheng Wang, Xiangyang Yang, Shuiwang Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five tracking benchmarks affirm the effectiveness and versatility of our approach, positioning it as a state-of-the-art solution in visual tracking.
Researcher Affiliation Academia 1 College of Computer Science and Engineering, Guilin University of Technology, Guilin, China.
Pseudocode No The paper describes algorithms and methods using text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is released at: https://github.com/ wuyou3474/AVTrack.
Open Datasets Yes Training. We employ the training splits of multiple datasets for training, including GOT-10k (Huang et al., 2021), La SOT (Fan et al., 2019), COCO (Lin et al., 2014), and Tracking Net (Muller et al., 2018). Our method is presented based on five UAV tracking benchmarks, i.e., UAV123 (Mueller et al., 2016), UAV123@10fps (Mueller et al., 2016), Vis Drone2018 (Zhu et al., 2018), UAVDT (Du et al., 2018), and DTB70 (Li & Yeung, 2017).
Dataset Splits No The paper states 'We employ the training splits of multiple datasets for training' but does not specify the explicit train/validation/test split percentages, sample counts, or methodology for generating these splits.
Hardware Specification Yes Our evaluation is performed on a PC that was equipped with an i9-10850K processor (3.6GHz), 16GB of RAM, and an NVIDIA Titan X GPU. In order to test our method on a real drone, we integrated an embedded onboard processor, the NVIDIA Jetson AGX Xavier 32GB, into a typical UAV platform.
Software Dependencies No The paper mentions using the AdamW optimizer and pre-trained ImageNet weights, but it does not specify version numbers for any software, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes The batch size is uniformly fixed at 32. We utilize the Adam W optimizer with a weight decay of 10 4, and 4 10 5 is used as the initial learning rate. The total number of training epochs is uniformly fixed at 300, with 60,000 image pairings processed every epoch, and the learning rate drops by a factor of 10 after 240 epochs. The sizes of the search region and template are set to 256 256 and 128 128, respectively. Finally, the total loss function is given by Ltotal = Lcls+λiou Liou+λL1LL1+γLspar+κLvir, where the constants λiou = 2 and λL1= 5 are set as in (Cui et al., 2022; Ye et al., 2022a), γ is set to 50 , κ is set to 0.0001.