ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Authors: Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, Xianxian Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental ODTrack achieves a new SOTA performance on seven benchmarks, while running at real-time speed. Code and models are available at https://github.com/GXNU-Zhong Lab/ODTrack. ... Our approach achieves a new state-of-the-art tracking results on seven visual tracking benchmarks, including La SOT, Tracking Net, GOT10K, La SOText, VOT2020, TNL2K, and OTB100.
Researcher Affiliation Academia Yaozong Zheng1,2, Bineng Zhong1,2*, Qihua Liang1,2, Zhiyi Mo3, Shengping Zhang4, Xianxian Li1,2 1Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University 2Guangxi Key Lab of Multi-Source Information Mining and Security, Guangxi Normal University 3Guangxi Key Laboratory of Machine Vision and Intelligent Control, Wuzhou University 4Harbin Institute of Technology
Pseudocode No The paper describes the architecture and processes in text and diagrams (Fig. 2, Fig. 3) and provides mathematical formulations (Eq. 1-5), but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and models are available at https://github.com/GXNU-Zhong Lab/ODTrack.
Open Datasets Yes The training data includes La SOT (Fan et al. 2019), GOT-10k (Huang, Zhao, and Huang 2021), Tracking Net (M uller et al. 2018), and COCO (Lin et al. 2014).
Dataset Splits No The paper does not explicitly detail a 'validation' dataset split with specific percentages or sample counts. It mentions training data and then uses the term 'validation' in the context of ablating the effectiveness of token propagation, not as a specific dataset split for hyperparameter tuning.
Hardware Specification Yes The model is conducted on a server with two 80GB Tesla A100 GPUs and set the batch size to be 8. ... The proposed ODTrack is tested on a 2080Ti, and it runs at 32 fps.
Software Dependencies No The paper mentions 'Adam W' (optimizer) and 'Vi T-Base' (model) and 'MAE' (pre-training parameters) but does not provide specific version numbers for any software libraries, programming languages, or environments used for implementation.
Experiment Setup Yes We employ the Adam W to optimize the network parameters with initial learning rate of 1 x 10^-5 for the backbone, 1 x 10^-4 for the rest, and set the weight decay to 10^-4. We set the training epochs to 300 epochs. 60,000 image pairs are randomly sampled in each epoch. The learning rate drops by a factor of 10 after 240 epochs. ... The model is conducted on a server with two 80GB Tesla A100 GPUs and set the batch size to be 8. In terms of input data, we take the video sequence including three reference frames with 192 x 192 pixels and two search frames with 384 x 384 pixels as the input to the model.