VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
Authors: Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation on HM3D and HSSD validates Voro Nav surpasses existing benchmarks in both success rate and exploration efficiency (absolute improvement: +2.8% Success and +3.7% SPL on HM3D, +2.6% Success and +3.8% SPL on HSSD). |
| Researcher Affiliation | Collaboration | 1Department of Advancded Manufacturing and Robotics, College of Engineering, Peking University, Beijing, China. 2The University of Hong Kong. 3Open GVLab, Shanghai AI Laboratory. 4National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Navigation Process of Voro Nav; Algorithm 2 Look Around |
| Open Source Code | Yes | Project page: https://voro-nav.github.io |
| Open Datasets | Yes | The HM3D dataset provides 20 high-fidelity reconstructions of entire buildings and contains 2K validation episodes for object navigation tasks. The HSSD dataset provides 40 high-quality synthetic scenes and contains 1.2K validation episodes for object navigation. |
| Dataset Splits | Yes | The HM3D dataset provides 20 high-fidelity reconstructions of entire buildings and contains 2K validation episodes for object navigation tasks. The HSSD dataset provides 40 high-quality synthetic scenes and contains 1.2K validation episodes for object navigation. |
| Hardware Specification | Yes | These experimental results were obtained using a computer equipped with a 13th-generation Intel Core i7-13700KF CPU and an Nvidia RTX 4070 GPU with 12GB of memory. |
| Software Dependencies | No | The paper mentions models like Grounded-SAM, BLIP, and GPT-3.5 with their corresponding citations, but does not specify software dependencies like Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | The agent s action space is {Stop, Move Forward, Turn Left, Turn Right, Look Up, Look Down}, with a discrete movement increment of 0.25m and discrete rotations of 30 . |