Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MixFormerV2: Efficient Fully Transformer Tracking
Authors: Yutao Cui, Tianhui Song, Gangshan Wu, Limin Wang
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments, We evaluate the performance of our proposed trackers on 6 benchmark datasets: including the large-scale La SOT [20], La SOText [20], Tracking Net [42], UAV123 [41], TNL2K [48] and VOT2022 [30]. |
| Researcher Affiliation | Academia | Yutao Cui Tianhui Song Gangshan Wu Limin Wang State Key Laboratory for Novel Software Technology, Nanjing University, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/MCG-NJU/Mix Former V2 |
| Open Datasets | Yes | The training datasets includes Tracking Net [42], La SOT [20], GOT-10k [28] and COCO [35] training splits., which are the same as Mix Former [14]. |
| Dataset Splits | No | The paper lists training datasets (Tracking Net, La SOT, GOT-10k, COCO) and test datasets, but does not explicitly specify a separate validation dataset split. |
| Hardware Specification | Yes | The distillation training is conducted on 8 NVidia Quadro RTX 8000 GPUs. The inference process runs on one NVidia Quadro RTX 8000 GPU and Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz. |
| Software Dependencies | Yes | Our trackers are implemented using Python 3.6 and Py Torch 1.7. |
| Experiment Setup | Yes | Each distillation training stage takes 500 epochs, where the first m = 40 epochs are for progressively eliminating layers. We train the score prediction MLP for additional 50 epochs. The batch size is 256, each GPU holding 32 samples. We use Adam W optimizer with weight decay of 10 4. The initial learning rate is 10 4 and will be decreased to 10 5 after 400 epochs. We use horizontal flip and brightness jittering for data augmentation. The resolutions of search and template images for Mix Former V2-B are 288 288 and 128 128 respectively. While for Mix Former V2-S, the resolutions of search and template images are 224 224 and 112 112 for real-time tracking on CPU platform. |