A Dual Semantic-Aware Recurrent Global-Adaptive Network for Vision-and-Language Navigation

Authors: Liuyi Wang, Zongtao He, Jiagui Tang, Ronghao Dang, Naijia Wang, Chengju Liu, Qijun Chen

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on the R2R and REVERIE datasets demonstrate that our method achieves better performance than existing methods.
Researcher Affiliation Academia 1Tongji University, Shanghai, China 2Tongji Artificial Intelligence (Suzhou) Research Institute, Suzhou, China {wly, xingchen327, 2130701, dangronghao, 2030715, liuchengju, qjchen}@tongji.edu.cn
Pseudocode No The paper does not include any explicitly labeled pseudocode or algorithm blocks. Figures present architectural diagrams rather than structured code steps.
Open Source Code Yes Code is available at https: //github.com/Crystal Sixone/DSRG.
Open Datasets Yes To validate our proposed method, we conduct extensive experiments on the R2R [Anderson et al., 2018] and REVERIE datasets [Qi et al., 2020b].
Dataset Splits Yes For R2R, four standard metrics are for evaluation: the navigation error (NE): the distance between the ground truth and the agent s stop position; the success rate (SR): the ratio of paths that stop within 3m from the target points; the oracle success rate (OSR): SR with the oracle stop policy; and the success rate weighted by the path length (SPL): SR penalized by the path length. For REVERIE, another two metrics are added: remote grounding success rate (RGS): the ratio of objects grounded correctly, and the RGS weighted by the path length (RGSPL). ... Table 1: Comparison with the state-of-the-art methods on the R2R dataset. (Includes "Validation Seen" and "Validation Unseen" columns).
Hardware Specification Yes In the pre-training stage, we train our DSRG with batch size 24 for 400k iterations using 1 NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using a "BERT model" and "Vi T-B/16" for feature extraction, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific library versions).
Experiment Setup Yes In the pre-training stage, we train our DSRG with batch size 24 for 400k iterations using 1 NVIDIA RTX 3090 GPU. ... During fine-tuning, the batch size and the learning rate are 4 and 5 × 10−6, respectively. ... The numbers of transformer layers for instructions, visual and semantic features, and local-global cross-modal attention modules are 9, 2 and 4, respectively.