Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

Authors: Zebin Yang, Sunjian Zheng, Tong Xie, Tianshi Xu, Bo Yu, Fan Wang, Jie Tang, Shaoshan Liu, Meng Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that Efficient Nav achieves 11.1% improvement in success rate on HM3D benchmark over GPT-4-based baselines, and demonstrates 6.7 real-time latency reduction and 4.7 end-to-end latency reduction over GPT-4 planner. Our code is available on https://github.com/PKUSEC-Lab/Efficient Nav. We evaluate Efficient Nav on the HM3D dataset [3] based on the Habitat simulation platform [61]. The comparison of SR and SPL is shown in Table 2. The latency comparison is shown in Table 3. Table 4: Ablation study on Efficient Nav methods with LLa VA-34b.
Researcher Affiliation	Academia	1Institute for Artificial Intelligence, Peking University 2School of Integrated Circuits, Peking University 3Shenzhen Institute of Artificial Intelligence and Robotics for Society 4School of Computer Science and Engineering, South China University of Technology 5Beijing Advanced Innovation Center for Integrated Circuits. Corresponding author. Emails: EMAIL, EMAIL
Pseudocode	No	The paper describes the methods narratively in sections like 3.2 Discrete Memory Caching, 3.3 Attention-based Memory Clustering, and 3.4 Semantics-aware Memory Retrieval, without presenting any formal pseudocode blocks or algorithms.
Open Source Code	Yes	Our code is available on https://github.com/PKUSEC-Lab/Efficient Nav.
Open Datasets	Yes	We evaluate Efficient Nav on the HM3D dataset [3] based on the Habitat simulation platform [61]. Here we evaluate our method on HM3D-OVON [78], which has more target object categories.
Dataset Splits	No	We evaluate Efficient Nav on the HM3D dataset [3] based on the Habitat simulation platform [61]. In the simulation platform, the robot can access RGBD observation of the environment. In each task, the robot is placed at a different starting point in the environment and is only instructed to find a specific object, e.g., TV , chair , sofa , "bed", "toilet", "plant" etc., which is harder than tasks giving detailed, step-by-step directions [2, 48]. For accuracy, we report two metrics: i) the average success rate (SR), and ii) the success rate penalized by path length (SPL), which both evaluates the accuracy and the efficiency of robot trajectory. This section describes the evaluation setup but does not specify how the HM3D dataset itself was split into training, validation, or test sets for the purpose of their experiments or for defining evaluation environments.
Hardware Specification	Yes	We implement our methods on 4 LLMs, LLa VA-7b, LLa VA-13b, LLa VA-34b, and LLa MA3.2-11b on NVIDIA A6000 GPU and Jetson Orin. We use a single NVIDIA RTX A6000 GPU to deploy LLa VA-7b and LLa MA3.2-11b. When using LLa VA-13b and LLa VA-34b, we deploy our system on 2 and 4 NVIDIA RTX A6000 GPUs, respectively. We also deploy LLa VA-7b on a single Jetson AGX Orin.
Software Dependencies	No	The paper mentions several models and platforms like LLa VA, LLa MA3.2, Mistral-7B-Instruct-v0.2, LLa MA-3.1, Vicuna-13B-v1.5, Nous-Hermes-2-Yi-34B, Grounding Dino [37], CLIP model [11], and Habitat simulation platform [61]. However, it does not specify version numbers for these software components or for general programming frameworks like Python, PyTorch, or CUDA, which are necessary to fully replicate the experiment environment.
Experiment Setup	No	The paper describes the LLM backbones and vision encoders used (LLa VA-7b, LLa MA-3.2-11b, LLa VA-13b, LLa VA-34b with specific underlying models), states that vision encoders are "fine-tuned during training," and mentions using Grounding Dino for detection. It also refers to thresholds for attention-based memory clustering and semantics-aware memory retrieval. However, specific values for hyperparameters such as learning rate, batch size, number of epochs, optimizer settings for the vision encoder fine-tuning, or the actual values for the mentioned thresholds are not provided in the main text or appendices.