reproducibilityindex.ai

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

Authors: Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan Liang1592-1600

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the model with CITL surpasses the previous state-of-the-art methods on R2R, R4R, and Rx R.
Researcher Affiliation	Collaboration	1Shenzhen Campus of Sun Yat-sen University, Shenzhen 2Monash University 3Huawei Noah s Ark Lab 4Alibaba Group
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/ liangcici/CITL-VLN.
Open Datasets	Yes	The R2R (Anderson et al. 2018b) dataset consists of 90 housing environments. The training set comprises 61 scenes, and the validation unseen set and test unseen set contain 11 and 18 scenes respectively. R4R (Jain et al. 2019) concatenates the trajectories and instructions in R2R. Rx R (Ku et al. 2020) is a larger dataset containing more extended instructions and trajectories.
Dataset Splits	Yes	The training set comprises 61 scenes, and the validation unseen set and test unseen set contain 11 and 18 scenes respectively.
Hardware Specification	Yes	All experiments are conducted on an NVIDIA 3090 GPU.
Software Dependencies	No	The paper mentions 'Mind Spore Lite tool' and 'Mind Spore' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	In all contrastive losses, the margin m is set to 0.25, and λ1, λ2 and λ3 are ﬁxed to 0.1, 0.01 and 0.01 respectively. The size of all memory banks is ﬁxed to 240. αp and αn are set to 1.2 and 1.4 respectively. Training schedules are the same as baselines (Tan, Yu, and Bansal 2019; Hong et al. 2021).