reproducibilityindex.ai

Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor

Authors: Keji He, Kehan Chen, Jiawang Bai, Yan Huang, Qi Wu, Shu-Tao Xia, Liang Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate the effectiveness of our method in both physical and digital spaces across different VLN agents, as well as its robustness to various visual and textual variations.
Researcher Affiliation	Collaboration	1Shandong University 2New Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 3School of Artificial Intelligence, University of Chinese Academy of Sciences 4Tencent 5School of Computer Science, University of Adelaide 6Tsinghua Shenzhen International Graduate School, Tsinghua University
Pseudocode	No	The paper describes the proposed methods using textual descriptions and equations, but it does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	The code is available at https://github.com/Chenkehan21/VLN-ATT.
Open Datasets	Yes	Regarding the visual environment, we conduct our experiments based on the photo-realistic Matterport3d dataset [8]. ... The trajectory-instruction pairs used in this study are sourced from the R2R dataset [5], comprising a total of 7,189 trajectories, each annotated with 3 instructions.
Dataset Splits	Yes	We utilize 61 houses from the training split for navigation or backdoor attack training, and 11 houses from the validation unseen split for test. There is no overlap between these two splits.
Hardware Specification	Yes	The average training time is about 6500 minutes on a single NVIDIA V100 GPU.
Software Dependencies	No	The paper states that training and testing details are kept consistent with HAMT [10] and Rec Bert [22] baselines, but it does not specify explicit software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	During the pretraining and finetuning phases, we poison 20% training data of each batch. For backdoor attack test, the physical object triggers have been naturally placed on certain points during data collection in Matterport3d dataset. ... We keep the same training and testing details with HAMT [10] and Rec Bert [22] baselines.