Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Active Test-time Vision-Language Navigation

Authors: Heeju Ko, Sung June Kim, Gyeongrok Oh, Jeongyoon YOON, Honglak Lee, Sujin Jang, Seungryong Kim, Sangpil Kim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on challenging VLN benchmarks REVERIE, R2R, and R2R-CE demonstrate that ATENA successfully overcomes distributional shifts at test time, outperforming the compared baseline methods across various settings. Tables 1, 2, and 3 report experimental results on these datasets.
Researcher Affiliation Collaboration 1 Korea University 2 University of Michigan 3 Samsung AI Center, DS Division 4 KAIST AI
Pseudocode Yes Algorithm 1 Self-Active Learning for Online Adaptation
Open Source Code No The source code will be publicly released upon acceptance of the paper.
Open Datasets Yes We conduct experiments on three challenging VLN benchmarks REVERIE [17], R2R [1], and R2R-CE [18].
Dataset Splits Yes We conduct experiments on three challenging VLN benchmarks REVERIE [17], R2R [1], and R2R-CE [18]. Table 1 reports the comparisons of the navigation results on the REVERIE dataset... Val Seen Val Unseen Test Unseen. In Table 2, we present the experimental results on the R2R dataset. Table 3: Experimental results on the R2R-CE dataset. Val Seen Val Unseen.
Hardware Specification No Detailed descriptions of the computer resources are provided in the appendix. The main body of the paper does not specify hardware used for experiments.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers in its main text.
Experiment Setup No Detailed descriptions of our experimental settings, including hardware specifications and hyperparameters, are provided in the appendix. The main body of the paper discusses some parameters like 'λ' and 'δ' but does not provide a comprehensive list of hyperparameters or training configurations.