Compact Transformer Tracker with Correlative Masked Modeling

Authors: Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show the proposed compact transform tracker outperforms existing approaches, including advanced attention variants, and demonstrates the sufficiency of self-attention in tracking tasks. Our method achieves state-of-the-art performance on five challenging datasets, along with the VOT2020, UAV123, La SOT, Tracking Net, and GOT-10k benchmarks.
Researcher Affiliation Academia Zikai Song1, Run Luo1, Junqing Yu1 , Yi-Ping Phoebe Chen2, Wei Yang1 1Huazhong University of Science and Technology, China 2La Trobe University, Australia
Pseudocode No The paper describes its approach verbally and with figures but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Our project is available at https://github.com/HUSTDML/CTTrack.
Open Datasets Yes We adopt Co Co(Lin et al. 2014), La SOT(Fan et al. 2019), GOT-10k(Huang, Zhao, and Huang 2019), and Tracking Net(Muller et al. 2018) as our training dataset except the GOT-10k benchmark.
Dataset Splits No The paper mentions training on various datasets and an ablation study on La SOT as a validation accuracy measure, but it does not provide explicit percentages or sample counts for training/validation/test splits needed to reproduce the data partitioning.
Hardware Specification Yes We train our model on 4 Nvidia Tesla V100 GPUs for a total of 500 epochs, each epoch uses 6 104 images.
Software Dependencies Yes Our approach is implemented in Python 3.7 with Py Torch 1.7.
Experiment Setup Yes The Adam W optimizer (Loshchilov and Hutter 2018) is employed with initial learning rate (lr) of 1e-4 with the layer-wise decay 0.75, and the lr decreases according to the cosine function with the final decrease factor of 0.1. We adopt a warm-up lr with the 0.2 warm-up factor on the first 5 epochs. We train our model on 4 Nvidia Tesla V100 GPUs for a total of 500 epochs, each epoch uses 6 104 images. The mini-batch size is set to 128 images with each GPU hosting 32 images.