Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Where Does It Exist from the Low-Altitude: Spatial Aerial Video Grounding
Authors: Yang Zhan, Yuan Yuan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our SAVG-DETR significantly outperforms existing state-of-the-art methods. |
| Researcher Affiliation | Academia | School of Artificial Intelligence, Optics and Electronics (i OPEN) Northwestern Polytechnical University |
| Pseudocode | No | The paper describes the methodology using architectural diagrams (Figure 3, Figure 4) and mathematical equations (e.g., equations 2, 4-10, 12-14, 17-21) and textual descriptions, but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The dataset and code will be available at here. |
| Open Datasets | Yes | To facilitate research in this field, we introduce the novel spatial aerial video grounding (SAVG) task. Specifically, we meticulously construct a large-scale benchmark, UAV-SVG, which contains over 2 million frames and offers 216 highly diverse target categories. |
| Dataset Splits | Yes | The split of training, validation, and testing is shown in Table 6 of the supplementary material. Train 14,060 2,812 ... Val 845 169 ... Test 2,915 583 ... |
| Hardware Specification | Yes | The proposed SAVG-DETR is trained using Py Torch on 2 NVIDIA L20 48G GPUs with 1 video per GPU and the whole optimization takes around 4 days. |
| Software Dependencies | No | The proposed SAVG-DETR is trained using Py Torch on 2 NVIDIA L20 48G GPUs... |
| Experiment Setup | Yes | We empirically use hyper-parameters N = 6, M = 6, λL1 = 5, and λAux GIo U = 4. We set the initial learning rates to 2 × 10−5 for the visual backbone, 5 × 10−5 for the language backbone, and 10−4 for the rest of the network. The learning rate follows a linear schedule with warm-up for the language encoder and the learning rate is dropped by 0.1 after 6 epochs for the rest of the network. We use the Adam W optimizer and weight decay rate 10−4 for training 20 epochs. |