Sequential Fusion Based Multi-Granularity Consistency for Space-Time Transformer Tracking

Authors: Kun Hu, Wenjing Yang, Wanrong Huang, Xianchen Zhou, Mingyu Cao, Jing Ren, Huibin Tan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have shown that our STCFormer outperforms many of the best-performing trackers on several popular benchmarks. [...] Experiments This section introduces the implementation details at the beginning. Then we display the results of comparison with predominant algorithms. In the final part, we perform an ablation study to judge the contribution of each constraint and analysis our model from different perspectives.
Researcher Affiliation Academia Kun Hu1*, Wenjing Yang1*, Wanrong Huang1, Xianchen Zhou2, Mingyu Cao1, Jing Ren1, Huibin Tan1 1Department of Intelligent Data Science, College of Computer Science and Technology, National University of Defense Technology. 2College of Sciences, National University of Defense Technology.
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement regarding the release of open-source code or a link to a code repository for the described methodology.
Open Datasets Yes The model is trained with following datasets: COCO (Lin et al. 2014), La SOT (Fan et al. 2018), GOT-10k (Huang, Zhao, and Huang 2018) and Tracking Net(M uller et al. 2018).
Dataset Splits No The paper mentions training and testing on datasets like COCO, La SOT, GOT-10k, and Tracking Net, and notes that GOT-10k has a training split, but it does not explicitly specify the training/validation/test dataset splits (e.g., percentages, sample counts for each split, or a clear cross-validation setup) for reproducibility across all datasets used.
Hardware Specification Yes We implement STCFormer using Python 3.8 and Py Torch 1.9. It is trained on a server with 8 NVIDIA A100 GPUs. The inference speed is tested with only one NVIDIA RTX2080Ti GPU.
Software Dependencies Yes We implement STCFormer using Python 3.8 and Py Torch 1.9.
Experiment Setup Yes The batch size of each GPU is 28 and we train the model with Adam W optimizer (Loshchilov and Hutter 2017). The weight decay is 10 4. The initial learning rate is 3 10 6 for the backbone and 3 10 5for other parameters. We set total training epochs to 300 with 60k image pairs per epoch and the learning rate decreases by a factor of 10 after 240 epochs. [...] λiou , λL1 are the regularization terms and are set to 2, 5, respectively. [...] where the regularization terms λLCL , λACL and λSCL are set to 1, 0.005, 0.1, respectively.