Compact Transformer Tracker with Correlative Masked Modeling
Authors: Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show the proposed compact transform tracker outperforms existing approaches, including advanced attention variants, and demonstrates the sufficiency of self-attention in tracking tasks. Our method achieves state-of-the-art performance on five challenging datasets, along with the VOT2020, UAV123, La SOT, Tracking Net, and GOT-10k benchmarks. |
| Researcher Affiliation | Academia | Zikai Song1, Run Luo1, Junqing Yu1 , Yi-Ping Phoebe Chen2, Wei Yang1 1Huazhong University of Science and Technology, China 2La Trobe University, Australia |
| Pseudocode | No | The paper describes its approach verbally and with figures but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our project is available at https://github.com/HUSTDML/CTTrack. |
| Open Datasets | Yes | We adopt Co Co(Lin et al. 2014), La SOT(Fan et al. 2019), GOT-10k(Huang, Zhao, and Huang 2019), and Tracking Net(Muller et al. 2018) as our training dataset except the GOT-10k benchmark. |
| Dataset Splits | No | The paper mentions training on various datasets and an ablation study on La SOT as a validation accuracy measure, but it does not provide explicit percentages or sample counts for training/validation/test splits needed to reproduce the data partitioning. |
| Hardware Specification | Yes | We train our model on 4 Nvidia Tesla V100 GPUs for a total of 500 epochs, each epoch uses 6 104 images. |
| Software Dependencies | Yes | Our approach is implemented in Python 3.7 with Py Torch 1.7. |
| Experiment Setup | Yes | The Adam W optimizer (Loshchilov and Hutter 2018) is employed with initial learning rate (lr) of 1e-4 with the layer-wise decay 0.75, and the lr decreases according to the cosine function with the final decrease factor of 0.1. We adopt a warm-up lr with the 0.2 warm-up factor on the first 5 epochs. We train our model on 4 Nvidia Tesla V100 GPUs for a total of 500 epochs, each epoch uses 6 104 images. The mini-batch size is set to 128 images with each GPU hosting 32 images. |