Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation
Authors: Junyu Gao, Xuan Yao, Changsheng Xu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method obtains impressive performance gains on four popular benchmarks. |
| Researcher Affiliation | Collaboration | 1State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA) 2School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) 3Peng Cheng Laboratory, Shen Zhen, China. |
| Pseudocode | No | The paper describes its approach using mathematical equations and textual explanations, particularly in Section 3, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | Code is available at https://github.com/Feliciaxyao/ ICML2024-FSTTA. |
| Open Datasets | Yes | We use the popular and standard VLN benchmark REVERIE (Qi et al., 2020)... In addition, we also adopt other three benchmarks for evaluating the effectiveness of our proposed FSTTA. Among them, R2R (Anderson et al., 2018)... SOON (Zhu et al., 2021)... R2R-CE (Krantz et al., 2020)... |
| Dataset Splits | Yes | Specifically, for VLN models equipped with TTA strategies, we run the experiments with shuffled samples 5 times and report the average results. ... our method obtains impressive performance gains on four popular benchmarks. ... We re-evaluate our methods on the REVERIE validation seen set. ... To verify the generalizability, we combine the seen and unseen sets into a unified set for online VLN. |
| Hardware Specification | Yes | All experiments are conducted on a RTX 3090 GPU. |
| Software Dependencies | Yes | Our model is implemented with Py Torch 1.7.1 and Python 3.8.5, required packages are listed in our code. |
| Experiment Setup | Yes | To better conform to practical applications, we set batch size to 1 during evaluation... We set the intervals for fast and slow updates to M = 3 and N = 4, the learning rates of the two phases are ˆγ(fast) = 6 10 4 and γ(slow) = 1 10 3. For the dynamic learning rate scaling, we empirically set the threshold τ = 0.7 in Eq. (6) and the update momentum ρ = 0.95 with the truncation interval [0.9, 1.1]. And the hyper-parameter q in Eq. (8) is set to 0.1. |