Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Open-World Drone Active Tracking with Goal-Centered Rewards

Authors: Haowei Sun, Jinwu Hu, Zhirui Zhang, Haoyuan Tian, Xinze Xie, Yufeng Wang, Xiaohua Xie, Yun Lin, Zhuliang Yu, Mingkui Tan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on simulator and real-world images demonstrate the superior performance of GC-VAT, achieving a Tracking Success Rate of approximately 72% on the simulator. The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.
Researcher Affiliation Academia 1 South China University of Technology, 2 Institute for Super Robotics (Huangpu), 3 Pazhou Laboratory, 4 Key Laboratory of Big Data and Intelligent Robot, Ministry of Education, 5 Peng Cheng Laboratory, 6 Sun Yat-sen University, 7 Harbin Engineering University
Pseudocode Yes Algorithm 1 Curriculum-Based Training (CBT)
Open Source Code Yes The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.
Open Datasets Yes First, we propose DAT, the first open-world drone active air-to-ground tracking benchmark. It encompasses 24 city-scale scenes, featuring targets with human-like behaviors and high-fidelity dynamics simulation. DAT also provides a digital twin tool for unlimited scene generation. ...The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark. ...We perform zero-shot transfer tests using 8 videos each from VOT [30], DTB70 [35] and UAVDT [20] datasets.
Dataset Splits Yes We conduct cross-scene and cross-domain tests. The former tests an agent trained under daytime conditions in unseen scenes with the same weather. The latter evaluates the agent in the same scene under varying weather conditions. See Appendix E.1 for details. Metrics. We use cumulative reward (CR = PEl t=1 rgc) and tracking success rate (TSR = 1 Eml PEl t=1 rdt 100%) to evaluate the agent performance. CR primarily reflects how well the agent centers the target over episode length El, while TSR measures the ability to keep the target in view, with rdt = 1 meaning the target is within the view (See Appendix C), and Eml denoting the maximum episode length. Agents are initialized at four relative angles to the target ([0, π /2 ] rad), with 10 episodes per angle (40 total). The mean and variance of these results are calculated for each map, and the final cross-scene and cross-domain performance are averaged across different scenes.
Hardware Specification Yes Furthermore, as a critical step beyond image-based evaluation, we conduct real-world experiments on a DJI Mini 3 Pro [13] drone. As shown in Fig. 7, we deploy GC-VAT on a laptop equipped with an RTX 3050 GPU and an Intel i5 CPU, use the DJI Mobile SDK [12] to obtain images, and control the drone with the predicted actions.
Software Dependencies No For the training method of GC-VAT, we choose to use PPO algorithm. In our two-stage curriculum learning process, we employ identical domain randomization. The structure of the GC-VAT is shown in Fig. 11. In this figure, C8 8-16S4 represents 16 convolutional filters of size 8 8 and stride 4. GRU256 denotes a GRU network with 256 hidden units, and FC200 represents a fully connected layer with 200 neurons.
Experiment Setup Yes The hyperparameters of the PPO algorithm used in this article are set as follows: discount factor γ = 0.9, GAE discount factor λ = 0.95, entropy coefficient β = 0.01, PPO clipping parameter ϵ = 0.2. In our two-stage curriculum learning process, we employ identical domain randomization. The flight altitude is selected from the interval [13, 22]m, and the camera pitch angle is chosen from [0.6, 1.38]rad. These parameters are consistent throughout each episode. Meanwhile, the drone s initial orientation relative to the target fluctuates within the range [ π, π]rad, and the target s initial position is set between [ 4.5, 2.5] [2.5, 4.5]m. The training involves a range of 9.2M to 21.3M steps across 35 parallel environments. The webots runs at 500Hz, with the algorithm updating every four steps (125Hz). Episodes last up to 1500 steps and were terminated early if the drone lost the target for over 100 consecutive steps, collided, or crashed. The drone translation speed is set to 40m/s, and rotational speed to 2rad/s. The map features 40 vehicles, each with a maximum speed of 20m/s and acceleration of 25m/s2.