Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Diffusion Mask-Driven Visual-language Tracking
Authors: Guangtong Zhang, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shuxiang Song
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on four tracking benchmarks (i.e., La SOT, TNL2K, La SOText, and OTB-Lang), we validate that our proposed Diffusion Mask Driven Visual-language Tracker can improve the robustness and effectiveness of the model. |
| Researcher Affiliation | Academia | 1Key Laboratory of Education Blockchain and Intelligent Technology Ministry of Education, Guangxi Normal University, Guilin 541004, China. 2Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China. 3Guangxi Key Laboratory of Machine Vision and Intelligent Control, Wuzhou University,Wuzhou 543002, China. |
| Pseudocode | No | The paper includes figures illustrating the framework and processes but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Through extensive experiments on four tracking benchmarks (i.e., La SOT, TNL2K, La SOText, and OTB-Lang) |
| Dataset Splits | Yes | We use the training splits of TNL2k[Xiao et al., 2021], La SOT[Fan et al., 2018], OTB-Lang[Zhenyang et al., 2017b], and Ref COCOgoogle[Junhua et al., 2016] multiple training sets for joint training. |
| Hardware Specification | Yes | Our model was implemented in the Pytorch framework on a server with 1 NVIDIA V100 GPU. ... We tested the proposed tracker on an NVIDIA 3080 GPU, and the single sample tracking speed is about 40 FPS. |
| Software Dependencies | No | The paper mentions implementing in "Pytorch framework" but does not specify a version number for Pytorch or any other software dependencies with their versions. |
| Experiment Setup | Yes | Our model is trained with 100 epochs, each epoch with 60,000 image pairs and each mini-batch with 64 sample pairs. We also train the model using the Adam W optimizer, set the weight decay to 10-4, the initial learning rate of the backbone to 2 x 10-5, and other parameters to 2 x 10-4. After 80 epochs, the learning rate is decreased by a factor of 10. |