GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance

Authors: shuaihang yuan, Hao Huang, Yu Hao, Congcong Wen, Anthony Tzes, Yi Fang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments conducted on HM3D and Gibson benchmark datasets demonstrate improvements in Success Rate and Success weighted by Path Length, underscoring the efficacy of our geometricpart and affordance-guided navigation approach in enhancing robot autonomy and versatility, without any additional object-specific training or fine-tuning with the semantics of unseen objects and/or the locomotions of the robot.
Researcher Affiliation Academia Shuaihang Yuan 1,2,4, Hao Huang 2,4, Yu Hao2,3,4, Congcong Wen2,4 Anthony Tzes1,2, Yi Fang 1,2,3,4 1NYUAD Center for Artificial Intelligence and Robotics (CAIR), Abu Dhabi, UAE. 2New York University Abu Dhabi, Electrical Engineering, Abu Dhabi 129188, UAE. 3New York University, Electrical & Computer Engineering Dept., Brooklyn, NY 11201, USA. 4Embodied AI and Robotics (AIR) Lab, NYU Abu Dhabi, UAE.
Pseudocode No The paper includes pipeline diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Our project is available at https://shalexyuan.github.io/GAMap/.
Open Datasets Yes Datasets. HM3D [23] is a dataset... Gibson [24] was developed by Al-Halah et al. [43].
Dataset Splits Yes We follow the validation settings from [3, 32] to evaluate our proposed method. ... We use 2000 episodes on the validation split of HM3D to report the results. Similarly, we follow this method [19]to produce the results on the Gibson dataset.
Hardware Specification Yes We use a Titan XP GPU for the experiment evaluation, and the entire evaluation process takes around 44 hours.
Software Dependencies No The paper mentions using CLIP and GPT-4V but does not provide specific version numbers for these or other software components.
Experiment Setup Yes In our experiment... We set Na to 1 and Ng to 3 for the experiments on the HM3D and Gibson datasets. For the partition process, we use three scaling levels in all our experiments: the first level is the original image, the second level has 4 equal-sized patches, and the third level has 16 equal-sized patches. We use CLIP as the pre-trained visual and text encoder.