Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation

Authors: Junyu Gao, Xuan Yao, Changsheng Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method obtains impressive performance gains on four popular benchmarks.
Researcher Affiliation Collaboration 1State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA) 2School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) 3Peng Cheng Laboratory, Shen Zhen, China.
Pseudocode No The paper describes its approach using mathematical equations and textual explanations, particularly in Section 3, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes Code is available at https://github.com/Feliciaxyao/ ICML2024-FSTTA.
Open Datasets Yes We use the popular and standard VLN benchmark REVERIE (Qi et al., 2020)... In addition, we also adopt other three benchmarks for evaluating the effectiveness of our proposed FSTTA. Among them, R2R (Anderson et al., 2018)... SOON (Zhu et al., 2021)... R2R-CE (Krantz et al., 2020)...
Dataset Splits Yes Specifically, for VLN models equipped with TTA strategies, we run the experiments with shuffled samples 5 times and report the average results. ... our method obtains impressive performance gains on four popular benchmarks. ... We re-evaluate our methods on the REVERIE validation seen set. ... To verify the generalizability, we combine the seen and unseen sets into a unified set for online VLN.
Hardware Specification Yes All experiments are conducted on a RTX 3090 GPU.
Software Dependencies Yes Our model is implemented with Py Torch 1.7.1 and Python 3.8.5, required packages are listed in our code.
Experiment Setup Yes To better conform to practical applications, we set batch size to 1 during evaluation... We set the intervals for fast and slow updates to M = 3 and N = 4, the learning rates of the two phases are ˆγ(fast) = 6 10 4 and γ(slow) = 1 10 3. For the dynamic learning rate scaling, we empirically set the threshold τ = 0.7 in Eq. (6) and the update momentum ρ = 0.95 with the truncation interval [0.9, 1.1]. And the hyper-parameter q in Eq. (8) is set to 0.1.