MixFormerV2: Efficient Fully Transformer Tracking

Authors: Yutao Cui, Tianhui Song, Gangshan Wu, Limin Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments, We evaluate the performance of our proposed trackers on 6 benchmark datasets: including the large-scale La SOT [20], La SOText [20], Tracking Net [42], UAV123 [41], TNL2K [48] and VOT2022 [30].
Researcher Affiliation Academia Yutao Cui Tianhui Song Gangshan Wu Limin Wang State Key Laboratory for Novel Software Technology, Nanjing University, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/MCG-NJU/Mix Former V2
Open Datasets Yes The training datasets includes Tracking Net [42], La SOT [20], GOT-10k [28] and COCO [35] training splits., which are the same as Mix Former [14].
Dataset Splits No The paper lists training datasets (Tracking Net, La SOT, GOT-10k, COCO) and test datasets, but does not explicitly specify a separate validation dataset split.
Hardware Specification Yes The distillation training is conducted on 8 NVidia Quadro RTX 8000 GPUs. The inference process runs on one NVidia Quadro RTX 8000 GPU and Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz.
Software Dependencies Yes Our trackers are implemented using Python 3.6 and Py Torch 1.7.
Experiment Setup Yes Each distillation training stage takes 500 epochs, where the first m = 40 epochs are for progressively eliminating layers. We train the score prediction MLP for additional 50 epochs. The batch size is 256, each GPU holding 32 samples. We use Adam W optimizer with weight decay of 10 4. The initial learning rate is 10 4 and will be decreased to 10 5 after 400 epochs. We use horizontal flip and brightness jittering for data augmentation. The resolutions of search and template images for Mix Former V2-B are 288 288 and 128 128 respectively. While for Mix Former V2-S, the resolutions of search and template images are 224 224 and 112 112 for real-time tracking on CPU platform.