Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation

Authors: Zibo Zhou, Yue Hu, Lingkai Zhang, Zonglin Li, Siheng Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on HM3D and HSSD benchmarks show that Belief Map Nav achieves state-of-the-art (SOTA) Success Rate (SR) and Success weighted by Path Length (SPL), with a notable 9.7 SPL improvement over the previous best SR method, validating its effectiveness and efficiency. The source code is publicly available at: https://github.com/ZiboKNOW/BeliefMapNav
Researcher Affiliation	Academia	1Shanghai Jiao Tong University 2University of Michigan
Pseudocode	Yes	The simulated annealing algorithm is a probabilistic optimization algorithm that can be used to find an approximate solution to the path planning problem. The algorithm is inspired by the annealing process in metallurgy, where a material is heated and then cooled to remove defects and improve its properties. The algorithm works by iteratively exploring the solution space and accepting or rejecting new solutions based on their cost and a temperature parameter. Our implementation is based on the following steps: 1. Initialization: Set the initial and terminal temperature T0 and Tf, the cooling rate α, and the number of samples to simulate N. When the current temperature T is greater than the terminal temperature Tf, the algorithm will continue to run. 2. Iterative Process: While the termination criterion is not met, for each sample, the algorithm will perform the following steps: (a) Generating Neighbor Solution: Generate a neighbor solution π from the current solution π by applying three kinds of operations: swap, shift, or reverse. (1) swap: swap two points in the path. (2) shift: move a segment of the path to a different position. (3) reverse: reverse a segment of the path. The repetition times of the operations are controlled by the temperature T, which decreases with time. Due to different operations having different degrees of impact on π, the probability of selecting each operation is different. And because the first point in the path is the starting point, it does not participate in this transformation. (b) Evaluation and Acceptance: Evaluate the cost of the new path W(π ) and apply the Metropolis Criterion: Compare it with the cost of the current path W(π). If W(π ) < W(π), accept the new path. Otherwise, accept it with a probability of exp W (π) W (π ). (c) Cooling: Update the temperature T by multiplying it with the cooling rate α. 3. Termination: The algorithm ends when the temperature T is less than the terminal temperature Tf. The final output is the sample with the lowest cost.
Open Source Code	Yes	The source code is publicly available at: https://github.com/ZiboKNOW/BeliefMapNav
Open Datasets	Yes	We evaluate our method on three standard benchmarks: HM3D [19], MP3D [26] and HSSD [20].
Dataset Splits	Yes	HM3D, the official dataset of the Habitat 2022 Object Nav Challenge, includes 2,000 validation episodes across 20 environments and 6 object categories. MP3D, a large-scale indoor 3D scene dataset, is commonly used in Habitat-based Object Nav evaluations. We conduct experiments on its validation set, consisting of 11 environments, 21 object categories, and 2,195 object-goal navigation episodes. HSSD, a synthetic dataset with scenes based on real house layouts, contains 40 validation scenes, 1,248 navigation episodes, and 6 object categories.
Hardware Specification	Yes	The system runs on a single RTX 4090 (24 GB VRAM) and uses approximately 13 GB of VRAM. ... On a laptop with Intel Core i5-12500H CPU, 16GB RAM, and NVIDIA Ge Force RTX 2050 Laptop GPU, for a task of 10 frontiers, the algorithm takes about 0.2 seconds to solve the problem.
Software Dependencies	No	The algorithm is implemented in Python and uses the Cu Py [52] library to accelerate the process.
Experiment Setup	Yes	We limit navigation to 500 steps, defining success as stopping within 0.1 m of the target. Each step moves the agent 0.25 m forward or rotates it by 30 . The RGB-D camera, mounted 0.88 m high, captures 640 480 images. The 3D voxel map has 45,000 voxels at 0.25 m resolution. We set wunobserved = 0.01 (Sec. 3.4.2). CLIP-Vi T-B-32 encodes visual/text features with image crop scale k = 3. GPT-4o generates three landmarks per level (nine total). Hierarchical scorer weights are w1 = 0.05, w2 = 0.1, w3 = 2, w4 = 0.01.