VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

Authors: Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation on HM3D and HSSD validates Voro Nav surpasses existing benchmarks in both success rate and exploration efficiency (absolute improvement: +2.8% Success and +3.7% SPL on HM3D, +2.6% Success and +3.8% SPL on HSSD).
Researcher Affiliation Collaboration 1Department of Advancded Manufacturing and Robotics, College of Engineering, Peking University, Beijing, China. 2The University of Hong Kong. 3Open GVLab, Shanghai AI Laboratory. 4National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China.
Pseudocode Yes Algorithm 1 Navigation Process of Voro Nav; Algorithm 2 Look Around
Open Source Code Yes Project page: https://voro-nav.github.io
Open Datasets Yes The HM3D dataset provides 20 high-fidelity reconstructions of entire buildings and contains 2K validation episodes for object navigation tasks. The HSSD dataset provides 40 high-quality synthetic scenes and contains 1.2K validation episodes for object navigation.
Dataset Splits Yes The HM3D dataset provides 20 high-fidelity reconstructions of entire buildings and contains 2K validation episodes for object navigation tasks. The HSSD dataset provides 40 high-quality synthetic scenes and contains 1.2K validation episodes for object navigation.
Hardware Specification Yes These experimental results were obtained using a computer equipped with a 13th-generation Intel Core i7-13700KF CPU and an Nvidia RTX 4070 GPU with 12GB of memory.
Software Dependencies No The paper mentions models like Grounded-SAM, BLIP, and GPT-3.5 with their corresponding citations, but does not specify software dependencies like Python, PyTorch, or CUDA versions.
Experiment Setup Yes The agent s action space is {Stop, Move Forward, Turn Left, Turn Right, Look Up, Look Down}, with a discrete movement increment of 0.25m and discrete rotations of 30 .