Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation

Authors: Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on several popular VLN benchmarks... Experimental results show that our AACL outperforms the state-of-the-art approaches on all benchmarks.
Researcher Affiliation Collaboration 1Shenzhen Campus of Sun Yat-sen University, Shenzhen 2Huawei Noah s Ark Lab 3Peng Cheng Laboratory 4Sun Yat-sen University
Pseudocode No The paper describes the proposed method in detail through text and figures, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code will be available at https://gitee.com/mindspore/models/tree/master/ research/cv/VLN-AACL.
Open Datasets Yes We evaluate AACL on several popular VLN benchmarks with both fine-grained instructions (R2R (Anderson et al. 2018)) and high-level instructions (REVERIE (Qi et al. 2020) and R2R-Last (Chen et al. 2021)).
Dataset Splits Yes The dataset is split into train, val seen, val unseen, and test unseen sets with 61, 56, 11, and 18 scenes, respectively.
Hardware Specification No The paper mentions implementing the model using 'Mind Spore Lite tool' and acknowledges 'Ascend AI Processor', but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper states, 'We implement our model using the Mind Spore Lite tool2', but does not provide specific version numbers for Mind Spore or any other software dependencies.
Experiment Setup Yes The batch size is set to 8, 8, 4 on R2R, R2R-Last, and REVERIE, respectively. The temperature parameter τ is set to 0.5. The loss weight λ1 is set to 0.2 on all datasets, and the loss weight λ2 is set to 1, 1, and 0.01 on R2R, REVERIE, and R2R-Last, respectively. The residual ratio in Eq. 10 is set to 0.8 empirically. During object concept mapping, we remain top 5 object predictions for each observation. The learning rate of the concept refining adapter is set to 0.1.