Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking

Authors: Hyunseop Kim, Juheon Jeong, Hanul Kim, Yeong Jun Koh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on MOT benchmarks demonstrate that our approach achieves state-of-the-art performance across major tracking metrics, with significant gains in association accuracy and identity consistency. Our results demonstrate the importance of decoupling dynamic appearance modeling from static identity cues, and provide a scalable foundation for robust tracking in complex scenarios.
Researcher Affiliation Academia Hyunseop Kim1 Juheon Jeong1 Hanul Kim2 Yeong Jun Koh1 1Chungnam National University 2Seoul National University of Science and Technology EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations and architectural diagrams (e.g., Figure 1), but it does not contain any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Code is available at github.com/altkddhfcjs/Dual Temporal MOT
Open Datasets Yes The proposed MOT achieves the state-of-the-art performance on the Dance Track [27] and Sports MOT [9] benchmarks, demonstrating strong association ability in challenging scenarios involving diverse objects with similar appearances. ... MOT17 [23]. It is a widely used pedestrian tracking dataset.
Dataset Splits Yes Dance Track [27]. It is a multi-human tracking dataset in dancing scenes with similar uniform appearance and diverse motion, requiring strong association under occlusion and ambiguity. Dance Track contains 40, 25, and 35 videos for training, validation, and test sets. Sports MOT [9]. ... The dataset consists of 45, 45, and 150 sports sequences for training, validation, and test sets, respectively. MOT17 [23]. ... It contains 7 training sequences and 7 test sequences.
Hardware Specification Yes All experiments are conducted on 8 NVIDIA RTX 4090 Ti GPUs with a batch size of 1, where each batch consists of a 4-frame video clip.
Software Dependencies No The paper mentions the use of DINO [37], ResNet-50 [14], and the AdamW optimizer, but it does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA in the main text.
Experiment Setup Yes MOT Network. The proposed framework is built on DINO [37] that uses Res Net-50 [14] backbone and transformer-based encoder. We select the top-M = 300 detection candidates from the encoder in DINO as anchor boxes and extract a candidate query feature q(t) D,m for each candidate by combining its learnable query embedding and positional embedding, following [37]. We set the number of dual-path temporal decoder layers to L = 6, a feature dimension to C = 256, and sampling points to K = 256. The confidence threshold α, the suppression threshold β, and τ are set to 0.6, 0.4, and 60, respectively. Training. As in the prior works [18, 22, 36], we perform a two-stage training strategy. In the first stage, the object detector is trained for 40 epochs. In the second stage, the backbone and encoder are frozen, and only the dual-path temporal decoder is trained. The input images are resized to a resolution of 1440 800. The proposed MOT framework employs multi-scale training, Mosaic [4], and Mix Up [15] for data augmentation. We use the Adam W optimizer with a learning rate of 1 10 4 and a weight decay of 1 10 4. The learning rate is decayed by a factor of 0.1 during the final 15 training epochs. The model is trained for 45, 45 and 65 epochs on Dance Track [27], Sports MOT [9] and MOT17 [23], respectively.