Towards Versatile Embodied Navigation
Authors: Hanqing Wang, Wei Liang, Luc V Gool, Wenguan Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity. |
| Researcher Affiliation | Academia | Hanqing Wang1, 2 Wei Liang1,4 Luc Van Gool2 Wenguan Wang3 1Beijing Institute of Technology 2Computer Vision Lab, ETH Zurich 3Re LER, AAII, University of Technology Sydney 4Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks labeled as such. |
| Open Source Code | Yes | The code and full dataset is available on the project page: https://github.com/hanqingwangai/ VXN. |
| Open Datasets | Yes | In response, a large-scale 3D dataset, VXN, is established to investigate multitask multimodal embodied navigation in audiovisual complex indoor environments. The code and full dataset is available on the project page: https://github.com/hanqingwangai/ VXN. |
| Dataset Splits | Yes | We use the standard 58/11/18 train/val/test split [115] of MP3D environments. |
| Hardware Specification | Yes | VIENNA is trained on 32 RTX 2080 GPUs for 180 M frames, costing 4, 608 GPU hours. |
| Software Dependencies | No | The paper mentions software components like ImageNet-pretrained ResNet50, AdamW optimizer, and Habitat Simulator with its license. However, it does not provide specific version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other key libraries used in the experimental setup to enable full reproducibility. |
| Experiment Setup | Yes | We use Adam W [123] optimizer with a learning rate of 2.5 10 4. We set other hyper-parameters as: d=512, NI=16, NL=120, NG=120. |