Towards Versatile Embodied Navigation

Authors: Hanqing Wang, Wei Liang, Luc V Gool, Wenguan Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.
Researcher Affiliation Academia Hanqing Wang1, 2 Wei Liang1,4 Luc Van Gool2 Wenguan Wang3 1Beijing Institute of Technology 2Computer Vision Lab, ETH Zurich 3Re LER, AAII, University of Technology Sydney 4Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks labeled as such.
Open Source Code Yes The code and full dataset is available on the project page: https://github.com/hanqingwangai/ VXN.
Open Datasets Yes In response, a large-scale 3D dataset, VXN, is established to investigate multitask multimodal embodied navigation in audiovisual complex indoor environments. The code and full dataset is available on the project page: https://github.com/hanqingwangai/ VXN.
Dataset Splits Yes We use the standard 58/11/18 train/val/test split [115] of MP3D environments.
Hardware Specification Yes VIENNA is trained on 32 RTX 2080 GPUs for 180 M frames, costing 4, 608 GPU hours.
Software Dependencies No The paper mentions software components like ImageNet-pretrained ResNet50, AdamW optimizer, and Habitat Simulator with its license. However, it does not provide specific version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other key libraries used in the experimental setup to enable full reproducibility.
Experiment Setup Yes We use Adam W [123] optimizer with a learning rate of 2.5 10 4. We set other hyper-parameters as: d=512, NI=16, NL=120, NG=120.