A Dual Semantic-Aware Recurrent Global-Adaptive Network for Vision-and-Language Navigation
Authors: Liuyi Wang, Zongtao He, Jiagui Tang, Ronghao Dang, Naijia Wang, Chengju Liu, Qijun Chen
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on the R2R and REVERIE datasets demonstrate that our method achieves better performance than existing methods. |
| Researcher Affiliation | Academia | 1Tongji University, Shanghai, China 2Tongji Artificial Intelligence (Suzhou) Research Institute, Suzhou, China {wly, xingchen327, 2130701, dangronghao, 2030715, liuchengju, qjchen}@tongji.edu.cn |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. Figures present architectural diagrams rather than structured code steps. |
| Open Source Code | Yes | Code is available at https: //github.com/Crystal Sixone/DSRG. |
| Open Datasets | Yes | To validate our proposed method, we conduct extensive experiments on the R2R [Anderson et al., 2018] and REVERIE datasets [Qi et al., 2020b]. |
| Dataset Splits | Yes | For R2R, four standard metrics are for evaluation: the navigation error (NE): the distance between the ground truth and the agent s stop position; the success rate (SR): the ratio of paths that stop within 3m from the target points; the oracle success rate (OSR): SR with the oracle stop policy; and the success rate weighted by the path length (SPL): SR penalized by the path length. For REVERIE, another two metrics are added: remote grounding success rate (RGS): the ratio of objects grounded correctly, and the RGS weighted by the path length (RGSPL). ... Table 1: Comparison with the state-of-the-art methods on the R2R dataset. (Includes "Validation Seen" and "Validation Unseen" columns). |
| Hardware Specification | Yes | In the pre-training stage, we train our DSRG with batch size 24 for 400k iterations using 1 NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using a "BERT model" and "Vi T-B/16" for feature extraction, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific library versions). |
| Experiment Setup | Yes | In the pre-training stage, we train our DSRG with batch size 24 for 400k iterations using 1 NVIDIA RTX 3090 GPU. ... During fine-tuning, the batch size and the learning rate are 4 and 5 × 10−6, respectively. ... The numbers of transformer layers for instructions, visual and semantic features, and local-global cross-modal attention modules are 9, 2 and 4, respectively. |