reproducibilityindex.ai

MixFormerV2: Efficient Fully Transformer Tracking

Authors: Yutao Cui, Tianhui Song, Gangshan Wu, Limin Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments, We evaluate the performance of our proposed trackers on 6 benchmark datasets: including the large-scale La SOT [20], La SOText [20], Tracking Net [42], UAV123 [41], TNL2K [48] and VOT2022 [30].
Researcher Affiliation	Academia	Yutao Cui Tianhui Song Gangshan Wu Limin Wang State Key Laboratory for Novel Software Technology, Nanjing University, China
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	https://github.com/MCG-NJU/Mix Former V2
Open Datasets	Yes	The training datasets includes Tracking Net [42], La SOT [20], GOT-10k [28] and COCO [35] training splits., which are the same as Mix Former [14].
Dataset Splits	No	The paper lists training datasets (Tracking Net, La SOT, GOT-10k, COCO) and test datasets, but does not explicitly specify a separate validation dataset split.
Hardware Specification	Yes	The distillation training is conducted on 8 NVidia Quadro RTX 8000 GPUs. The inference process runs on one NVidia Quadro RTX 8000 GPU and Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz.
Software Dependencies	Yes	Our trackers are implemented using Python 3.6 and Py Torch 1.7.
Experiment Setup	Yes	Each distillation training stage takes 500 epochs, where the first m = 40 epochs are for progressively eliminating layers. We train the score prediction MLP for additional 50 epochs. The batch size is 256, each GPU holding 32 samples. We use Adam W optimizer with weight decay of 10 4. The initial learning rate is 10 4 and will be decreased to 10 5 after 400 epochs. We use horizontal flip and brightness jittering for data augmentation. The resolutions of search and template images for Mix Former V2-B are 288 288 and 128 128 respectively. While for Mix Former V2-S, the resolutions of search and template images are 224 224 and 112 112 for real-time tracking on CPU platform.