Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation
Authors: Keji He, Chenyang Si, Zhihe Lu, Yan Huang, Liang Wang, Xinchao Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Promising results on R2R, Rx R, CVDN and REVERIE demonstrate that our FDA can be readily integrated with existing VLN approaches, improving performance without adding extra parameters, and keeping models simple and efficient. We simply investigate the sensitivity of benchmark methods to low and high-frequency information by perturbing the low-frequency or high-frequency components in images. Three powerful baseline models, i.e., HAMT [9], DUET [10], and TD-STP [64], are used to analyze the significance of low/high-frequency information on both R2R validation seen and unseen splits, wherein the navigation views are disrupted in the Fourier domain. As Shown in Figure 2, the three models maintain a relatively high Success Rate (SR) under low-frequency perturbations. |
| Researcher Affiliation | Academia | 1Center for Research on Intelligent Perception and Computing National Key Laboratory for Multi-modal Artificial Intelligence Systems Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3National University of Singapore 4Nanyang Technological University |
| Pseudocode | No | The paper contains diagrams illustrating the proposed approach (e.g., Figure 4), but these are visual representations and not structured pseudocode or algorithm blocks with numbered steps or code-like formatting. |
| Open Source Code | Yes | The code is available at https://github.com/hekj/FDA. |
| Open Datasets | Yes | The visual environments are based on the photo-realistic dataset Matterport3d (Mp3d) [6]. Four datasets containing the instruction-trajectory pairs have been adopted: R2R [5], Rx R [29], CVDN [52] and REVERIE [45]. |
| Dataset Splits | Yes | There are a total of 90 houses, with 61, 11, and 18 houses allocated for training/validation seen, validation unseen, and test splits, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general concepts like 'models trained with high-frequency information'. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., programming languages, libraries, or frameworks with their versions) that would be necessary for exact replication. |
| Experiment Setup | No | The paper describes the Frequency-enhanced Data Augmentation (FDA) method and its application, but it does not specify concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific optimizer settings. |