VTNet: Visual Transformer Network for Object Goal Navigation
Authors: Heming Du, Xin Yu, Liang Zheng
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in the artificial environment AI2-Thor demonstrate that VTNet significantly outperforms state-of-the-art methods in unseen testing environments. |
| Researcher Affiliation | Academia | Heming Du1,3, Xin Yu2 & Liang Zheng1 1Australian National University 2University of Technology Sydney 3CSIRO-DATA61 |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our codes and pre-trained model will be publicly released for reproducibility. |
| Open Datasets | Yes | We perform our experiments on AI2-Thor (Kolve et al., 2017), an artificial 3D environment with realistic photos. |
| Dataset Splits | Yes | We use the same training and evaluation protocols as the works (Wortsman et al., 2019; Du et al., 2020). 80 rooms out of 120 are selected as the training set while each scene contains 20 rooms. We equally divide the remaining 40 rooms into validation and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer (Kingma & Ba, 2014)" and "DETR as the object detector" but does not specify version numbers for these or any other software components. |
| Experiment Setup | Yes | We use a two-stage training strategy. In Stage 1, we train our visual transformer for 20 epochs with the supervision of optimal action instructions. In Stage 2, we train the navigation policy for 6M episodes in total with 16 asynchronous agents. We set a penalization 0.001 on each action step and a large reward 5 when an agent completes an episode successfully. We use the Adam optimizer (Kingma & Ba, 2014) to update the policy network with a learning rate 10 4 and the pre-trained VT with a learning rate 10 5. |