Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Authors: Jinyang Li, En Yu, Sijia Chen, Wenbing Tao
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method surpasses previous trackers on the open-vocabulary MOT benchmark while also achieving faster inference speeds and significantly reducing preprocessing requirements. The paper includes sections like "4 EXPERIMENTS", "4.1 DATASETS AND EVALUATION METRICS", "4.3 PERFORMANCE COMPARISON ON TAO DATASET", and presents numerous tables with performance metrics (e.g., Table 1, Table 2, Table 3) comparing proposed methods against state-of-the-art. |
| Researcher Affiliation | Academia | Jinyang Li En Yu Sijia Chen Wenbing Tao Huazhong University of Science and Technology EMAIL |
| Pseudocode | No | The paper describes the methods and architecture using figures, diagrams, and textual explanations, including mathematical formulations for losses and strategies. However, it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Models and code are released at https://github.com/jinyanglii/OVTR. |
| Open Datasets | Yes | Experimental results on the TAO (Dave et al., 2020) dataset demonstrate that OVTR outperforms state-of-the-art methods... Additionally, in the KITTI (Geiger et al., 2012) transfer experiment... For training, we leveraged the LVIS dataset, which includes 1,203 categories... We compare open-vocabulary MOT performance on OVT-B dataset (Liang & Han). |
| Dataset Splits | Yes | The KITTI dataset, comprising 21 training and 29 test sequences, focuses on autonomous driving scenarios with diverse objects... For evaluation, we use the TAO validation dataset and designate certain base categories that were not learned during training as novel categories. |
| Hardware Specification | Yes | Training is conducted on 4 NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper provides details on training parameters, optimizers, and model architecture, but it does not specify version numbers for any software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Training begins with the detection components, using a batch size of 2 for 33 epochs, a learning rate of 4e-5 that decays by a factor of 10 at the 20th epoch. Next, the dual-branch decoders and the updater are trained with a batch size of 1 for 16 epochs, starting with a learning rate of 4e-5, which decays at the 13th epoch. Multi-frame training is employed, progressively increasing the number of frames from 2 to 3, 4, and 5 at the 4th, 7th, and 14th epochs, respectively. The hyperparameter Οisol, the threshold for the matrix D, is set to a multiple of its mean value due to its variability. Table 12 and Table 13 list comprehensive hyper-parameters used in the detection and tracking training phases, respectively. |