Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Active Test-time Vision-Language Navigation

Authors: Heeju Ko, Sung June Kim, Gyeongrok Oh, Jeongyoon YOON, Honglak Lee, Sujin Jang, Seungryong Kim, Sangpil Kim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on challenging VLN benchmarks REVERIE, R2R, and R2R-CE demonstrate that ATENA successfully overcomes distributional shifts at test time, outperforming the compared baseline methods across various settings. Tables 1, 2, and 3 report experimental results on these datasets.
Researcher Affiliation	Collaboration	1 Korea University 2 University of Michigan 3 Samsung AI Center, DS Division 4 KAIST AI
Pseudocode	Yes	Algorithm 1 Self-Active Learning for Online Adaptation
Open Source Code	No	The source code will be publicly released upon acceptance of the paper.
Open Datasets	Yes	We conduct experiments on three challenging VLN benchmarks REVERIE [17], R2R [1], and R2R-CE [18].
Dataset Splits	Yes	We conduct experiments on three challenging VLN benchmarks REVERIE [17], R2R [1], and R2R-CE [18]. Table 1 reports the comparisons of the navigation results on the REVERIE dataset... Val Seen Val Unseen Test Unseen. In Table 2, we present the experimental results on the R2R dataset. Table 3: Experimental results on the R2R-CE dataset. Val Seen Val Unseen.
Hardware Specification	No	Detailed descriptions of the computer resources are provided in the appendix. The main body of the paper does not specify hardware used for experiments.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers in its main text.
Experiment Setup	No	Detailed descriptions of our experimental settings, including hardware specifications and hyperparameters, are provided in the appendix. The main body of the paper discusses some parameters like 'λ' and 'δ' but does not provide a comprehensive list of hyperparameters or training configurations.