Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DOVTrack: Data-Efficient Open-Vocabulary Tracking

Authors: Zekun Qian, Ruize Han, Zhixiang Wang, Junhui Hou, Wei Feng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves state-of-the-art performance on the OVMOT benchmark, surpassing existing methods by 3.8% in TETA metric, without requiring additional data or annotations.
Researcher Affiliation Academia 1College of Intelligence and Computing, Tianjin University, 2Shenzhen University of Advanced Technology, 3City University of Hong Kong EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methods in prose and illustrates them with figures, such as Figure 1, Figure 2, and Figure 3, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The code will be available at https://github.com/zekunqian/DOVTrack.
Open Datasets Yes Recently, the TAO dataset [10], with its 833 categories, is the only suitable video training set that meets the diversity requirements for current OVMOT research. Although TAO offers a rich variety of categories, its small size and sparse annotations pose significant challenges for model training.
Dataset Splits No Following other OVMOT methods [1, 11, 7, 8], we perform our evaluation with standard OV settings on the TAO dataset, which categorizes rare classes as novel and the others as base classes, similar to LVIS [9]. Comparative experiments are carried out on both the validation and test sets of TAO.
Hardware Specification Yes Training is performed for 10 epochs on only 2 RTX 3090 GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as libraries, programming languages, or frameworks used for implementation.
Experiment Setup Yes In the proposed TAO training stage, we jointly optimize the association, localization and classification branches on the base set of the TAO training set. Training is performed for 10 epochs on only 2 RTX 3090 GPUs. In the association training, we employ a D2MP-based diffusion model to denoise samples drawn from a standard normal distribution. fθ is implemented as a three-layer fully-connected network, with each layer followed by a Re LU activation and layer normalization. During the first five epochs, the diffusion model is trained on features produced by the association head (without generating new samples), and in the subsequent 5 epochs, we freeze the diffusion model and use it to generate augmented data for further association training. The association loss, consisting of a contrastive loss and an auxiliary loss, is identical to that used in OVTrack. In the classification training, we apply an Info NCE loss with ρ = 0.1 in Eq. 7 and also employ the standard cross-entropy loss used in OVTrack. In the localization training, the Io U threshold is lowered to 0.3. In the inference stage, we retain all OVTrack settings except that we change the maximum detections per frame to 80, the matching threshold to 0.38 and the memory length to 30.