Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LoRATv2: Enabling Low-Cost Temporal Modeling in One-Stream Trackers
Authors: Liting Lin, Heng Fan, Zhipeng Zhang, Yuqing Huang, Yaowei Wang, Yong Xu, Haibin Ling
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive experiments on multiple benchmarks, Lo RATv2 achieves state-of-the-art performance, substantially improved efficiency, and a superior performance-to-FLOPs ratio over state-of-the-art trackers. |
| Researcher Affiliation | Collaboration | Liting Lin Pengcheng Laboratory EMAIL Heng Fan University of North Texas EMAIL Zhipeng Zhang Shanghai Jiao Tong University Anyverse Intelligence EMAIL Yuqing Huang Pengcheng Laboratory EMAIL Yaowei Wang Harbin Institute of Technology, Shenzhen Pengcheng Laboratory EMAIL Yong Xu South China University of Technology EMAIL Haibin Ling Westlake University EMAIL |
| Pseudocode | No | The paper only describes algorithmic steps in prose and through architectural diagrams (e.g., Figure 2, Figure 3), without specific pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Liting Lin/Lo RATv2. |
| Open Datasets | Yes | We use La SOT [16], Tracking Net [28], GOT-10k [20] (excluding 1k sequences as in [21]), and COCO [26] for training. |
| Dataset Splits | Yes | We use La SOT [16], Tracking Net [28], GOT-10k [20] (excluding 1k sequences as in [21]), and COCO [26] for training. For the GOT-10k evaluation, models are trained exclusively on the GOT-10k training split. We compare our Lo RATv2 with recent Transformer-based trackers on four challenging benchmarks, following their official evaluation protocols. |
| Hardware Specification | Yes | All models are trained on 4 NVIDIA Ge Force RTX 4090 GPUs and evaluated on an NVIDIA Ge Force RTX 5090 GPU. |
| Software Dependencies | No | The paper mentions using specific techniques and pre-trained models (e.g., DINOv2), but does not list specific software libraries or their version numbers (like Python, PyTorch, CUDA versions) that would be needed to replicate the environment. |
| Experiment Setup | Yes | Phase 1 ( 224 variants): Models are trained for 170 epochs (131,072 iterations/epoch) on (z, x1) pairs. The template (z) and search region x1 are sampled from the same video (up to a 100-frame gap), with strong crop jitter applied to x1. The Vi T backbone (DINOv2 pre-trained [29, 12]) is frozen; two sets of Lo RA modules (rank r = 64), one for the template stream and one for the search region stream, are trained. Phase 2 ( 378 variants): Training continues for an additional 170 epochs on (z, x1, x2) triplets. The template z is randomly sampled from a video; x1, x2 are sampled from the same video (up to a 100-frame gap) with strong crop jitter. The backbone and previously trained Lo RA modules remain frozen. A new set of extra Lo RA modules (rank r = 64) and a corresponding prediction head are introduced exclusively for the x2 stream. |