Monocular Camera-Based Point-Goal Navigation by Learning Depth Channel and Cross-Modality Pyramid Fusion

Authors: Tianqi Tang, Heming Du, Xin Yu, Yi Yang5422-5430

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Gibson benchmark demonstrate that Geo-Nav outperforms the state-of-the-art in terms of efficiency and effectiveness.
Researcher Affiliation Academia Tianqi Tang1*, Heming Du2*, Xin Yu1, Yi Yang1 1Re LER, AAII, University of Technology Sydney, Australia 2Australian National University Tang@student.uts.edu.au, heming.du@anu.edu.au, {xin.yu, yi.yang}@uts.edu.au
Pseudocode No The paper describes its methods in text and figures but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing open-source code or a link to a code repository for the methodology described.
Open Datasets Yes We train and evaluate agents on the platform Habitat (Savva et al. 2019), which provides agents with photo-realistic images through virtual simulations. All the experiments are conducted on the Gibson dataset (Xia et al. 2018).
Dataset Splits Yes Gibson contains real-world indoor scenarios, including 5 million episodes in 72 indoor environments for training and 994 episodes in 14 unseen environments for evaluation.
Hardware Specification Yes Additionally, we train Geo-Nav with 100 million training steps on 1 Nvidia V100 GPU for seven days.
Software Dependencies No The paper mentions using the Adam optimizer and certain network architectures (Res Net18, PPO, LSTM), but it does not specify exact version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes We employ the Adam optimizer (Kingma and Ba 2014) with the batch size of 12 and a learning rate of 10 5. ... We utilize Adam optimizer with a learning rate of 2.5 10 4 to train the proposed agent within 10 million steps. Finally, we fine-tune the agent and employ a lower learning rate of 10 4 for RL. ... We set λ = 0.85 for all the experiments. ... We empirically set λ1 = 1, λ2 = 0.1 and λ3 = 0.5 to achieve good quality of depth maps.