Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor
Authors: Keji He, Kehan Chen, Jiawang Bai, Yan Huang, Qi Wu, Shu-Tao Xia, Liang Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the effectiveness of our method in both physical and digital spaces across different VLN agents, as well as its robustness to various visual and textual variations. |
| Researcher Affiliation | Collaboration | 1Shandong University 2New Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 3School of Artificial Intelligence, University of Chinese Academy of Sciences 4Tencent 5School of Computer Science, University of Adelaide 6Tsinghua Shenzhen International Graduate School, Tsinghua University |
| Pseudocode | No | The paper describes the proposed methods using textual descriptions and equations, but it does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The code is available at https://github.com/Chenkehan21/VLN-ATT. |
| Open Datasets | Yes | Regarding the visual environment, we conduct our experiments based on the photo-realistic Matterport3d dataset [8]. ... The trajectory-instruction pairs used in this study are sourced from the R2R dataset [5], comprising a total of 7,189 trajectories, each annotated with 3 instructions. |
| Dataset Splits | Yes | We utilize 61 houses from the training split for navigation or backdoor attack training, and 11 houses from the validation unseen split for test. There is no overlap between these two splits. |
| Hardware Specification | Yes | The average training time is about 6500 minutes on a single NVIDIA V100 GPU. |
| Software Dependencies | No | The paper states that training and testing details are kept consistent with HAMT [10] and Rec Bert [22] baselines, but it does not specify explicit software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | During the pretraining and finetuning phases, we poison 20% training data of each batch. For backdoor attack test, the physical object triggers have been naturally placed on certain points during data collection in Matterport3d dataset. ... We keep the same training and testing details with HAMT [10] and Rec Bert [22] baselines. |