Divert More Attention to Vision-Language Tracking
Authors: Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Tab. 2 presents the results and comparisons of our trackers with other SOTAs on La SOT [17], La SOTExt [16], TNL2K [53], OTB99-LANG [34] and GOT-10K [28]. |
| Researcher Affiliation | Collaboration | 1Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University 2Di Di Chuxing, Beijing, China 3Department of Computer Science and Engineering, University of North Texas |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models are released at https://github.com/Judas Die/SOTS. |
| Open Datasets | Yes | We train the trackers with supernet using training splits of COCO [36], Imagenet-VID [14], Imagenet-DET [14], Youtube-BB [43], GOT-10k [28], La SOT [17] and TNL2K [53] |
| Dataset Splits | No | The paper mentions using 'validation data' for evolutionary algorithms ('SUC (success score) on validation data is used as rewards of evolutionary algorithms') but does not specify the exact split percentages or sample counts for training, validation, and test sets. |
| Hardware Specification | Yes | The whole search pipeline consumes 15 hours on a single RTX-2080Ti GPU. |
| Software Dependencies | No | The paper mentions using BERT [15] and SPOS [26] but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We train the trackers with supernet using training splits of COCO [36], Imagenet-VID [14], Imagenet-DET [14], Youtube-BB [43], GOT-10k [28], La SOT [17] and TNL2K [53] for 5 epochs, where each epoch contains 1.2 10^6 template-search pairs. |