Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Where Does It Exist from the Low-Altitude: Spatial Aerial Video Grounding

Authors: Yang Zhan, Yuan Yuan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our SAVG-DETR significantly outperforms existing state-of-the-art methods.
Researcher Affiliation	Academia	School of Artificial Intelligence, Optics and Electronics (i OPEN) Northwestern Polytechnical University
Pseudocode	No	The paper describes the methodology using architectural diagrams (Figure 3, Figure 4) and mathematical equations (e.g., equations 2, 4-10, 12-14, 17-21) and textual descriptions, but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The dataset and code will be available at here.
Open Datasets	Yes	To facilitate research in this field, we introduce the novel spatial aerial video grounding (SAVG) task. Specifically, we meticulously construct a large-scale benchmark, UAV-SVG, which contains over 2 million frames and offers 216 highly diverse target categories.
Dataset Splits	Yes	The split of training, validation, and testing is shown in Table 6 of the supplementary material. Train 14,060 2,812 ... Val 845 169 ... Test 2,915 583 ...
Hardware Specification	Yes	The proposed SAVG-DETR is trained using Py Torch on 2 NVIDIA L20 48G GPUs with 1 video per GPU and the whole optimization takes around 4 days.
Software Dependencies	No	The proposed SAVG-DETR is trained using Py Torch on 2 NVIDIA L20 48G GPUs...
Experiment Setup	Yes	We empirically use hyper-parameters N = 6, M = 6, λL1 = 5, and λAux GIo U = 4. We set the initial learning rates to 2 × 10−5 for the visual backbone, 5 × 10−5 for the language backbone, and 10−4 for the rest of the network. The learning rate follows a linear schedule with warm-up for the language encoder and the learning rate is dropped by 0.1 after 6 epochs for the rest of the network. We use the Adam W optimizer and weight decay rate 10−4 for training 20 epochs.