Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
Authors: Pengteng Li, Pinhao Song, Wuyang Li, Huizai Yao, Weiyu Guo, Yijie Xu, Dugang Liu, Hui Xiong
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the VSI-BENCH and STIBENCH show that SEE&TREK consistently boosts various MLLMS performance across diverse spatial reasoning tasks with the most +3.5% improvement, offering a promising path toward stronger spatial intelligence. |
| Researcher Affiliation | Academia | Pengteng Li AI Thrust, HKUST(GZ) AI2ROBOTICS Pinhao Song KU Leuven Wuyang Li EPFL Huizai Yao AI Thrust, HKUST(GZ) Weiyu Guo AI Thrust, HKUST(GZ) AI2ROBOTICS Yijie Xu AI Thrust, HKUST(GZ) Dugang Liu SZU Hui Xiong AI Thrust, HKUST(GZ) CSE, HKUST |
| Pseudocode | Yes | The algorithm is presented in Algorithm 1 and the detailed version can be found in Algorithm A.1 in the appendix. |
| Open Source Code | Yes | The link of code: https://github.com/Hoantrbl/See Trek. |
| Open Datasets | Yes | We select VSI-BENCH [38] and STI-BENCH [39] as our spatial evaluation benchmark. [...] These videos are sourced from the validation sets of the public indoor 3D scene reconstruction datasets Scan Net [45] , Scan Net++ [46], and ARKit Scenes [47] |
| Dataset Splits | Yes | We select VSI-BENCH [38] and STI-BENCH [39] as our spatial evaluation benchmark. [...] VSI-BENCH randomly sample a subset of 400 questions (50 per task), which we will refer to as VSI-BENCH (tiny). [...] All evaluations are conducted under zero-shot settings. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA 8 A6000 and 6 A800. |
| Software Dependencies | No | For this part of algorithm development, we leverage Open CV 2 for efficient deployment. [...] we utilize YOLOV8-Tiny, named YOLOV8N from Ultralytics3 |
| Experiment Setup | Yes | SEE&TREK focus on the pre-processing stage of MLLMS, which samples one frame for every four frames from the given spatial video. For fair evaluation, we adopt 8 frames as input to test each MLLMS for the given videos. [...] To ensure reproducibility, we use greedy decoding for all models. |