VTNet: Visual Transformer Network for Object Goal Navigation

Authors: Heming Du, Xin Yu, Liang Zheng

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in the artificial environment AI2-Thor demonstrate that VTNet significantly outperforms state-of-the-art methods in unseen testing environments.
Researcher Affiliation Academia Heming Du1,3, Xin Yu2 & Liang Zheng1 1Australian National University 2University of Technology Sydney 3CSIRO-DATA61
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No Our codes and pre-trained model will be publicly released for reproducibility.
Open Datasets Yes We perform our experiments on AI2-Thor (Kolve et al., 2017), an artificial 3D environment with realistic photos.
Dataset Splits Yes We use the same training and evaluation protocols as the works (Wortsman et al., 2019; Du et al., 2020). 80 rooms out of 120 are selected as the training set while each scene contains 20 rooms. We equally divide the remaining 40 rooms into validation and test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using "Adam optimizer (Kingma & Ba, 2014)" and "DETR as the object detector" but does not specify version numbers for these or any other software components.
Experiment Setup Yes We use a two-stage training strategy. In Stage 1, we train our visual transformer for 20 epochs with the supervision of optimal action instructions. In Stage 2, we train the navigation policy for 6M episodes in total with 16 asynchronous agents. We set a penalization 0.001 on each action step and a large reward 5 when an agent completes an episode successfully. We use the Adam optimizer (Kingma & Ba, 2014) to update the policy network with a learning rate 10 4 and the pre-trained VT with a learning rate 10 5.