Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Test3R: Learning to Reconstruct 3D at Test Time

Authors: Yuheng Yuan, Qiuhong Shen, Shizun Wang, Xingyi Yang, Xinchao Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our technique significantly outperforms previous state-of-the-art methods on the 3D reconstruction and multiview depth estimation tasks. We evaluated Test3R on the DUSt3R for 3D reconstruction and multi-view depth estimation. Test3R performs exceptionally well across diverse datasets, improving upon vanilla DUSt3R to achieve competitive or state-of-the-art results in both tasks. We conducted comprehensive experiments across several downstream tasks on the DUSt3R. Quantitative Results. The quantitative evaluation is shown in Table 1. Qualitative Results. The qualitative results are shown in Figure 4.
Researcher Affiliation	Academia	Yuheng Yuan1 Qiuhong Shen1 Shizun Wang1 Xingyi Yang2,1 Xinchao Wang1 1National University of Singapore, 2The Hong Kong Polytechnic University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the method using equations (1) to (8) and textual explanations, but it does not contain a clearly labeled pseudocode block or algorithm section.
Open Source Code	Yes	Code is available at https://github.com/nop QAQ/Test3R.
Open Datasets	Yes	We utilize two scene-level datasets, 7Scenes [65] and NRGBD [66] datasets. Following Robust MVD [62], performances are measured on the object-centric dataset DTU [57] and scene-centric dataset ETH3D [58].
Dataset Splits	Yes	We follow the experiment setting on the CUT3R [35], and employ several commonly used metrics: Accuracy (Acc), Completion (Comp), and Normal Consistency (NC) metrics. Following Robust MVD [62], performances are measured on the object-centric dataset DTU [57] and scene-centric dataset ETH3D [58].
Hardware Specification	No	The NeurIPS Paper Checklist states: "Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: The paper provides sufficient information on the computer resources in appendix." However, the appendix content is not provided in the given text, and the main body of the paper does not specify any hardware details like GPU or CPU models.
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) in the provided text.
Experiment Setup	No	The NeurIPS Paper Checklist states: "Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: The detailed experiment settings are listed in Section 5, and appendix." However, the appendix content is not provided in the given text. Section 5 describes the evaluation and baselines but does not provide specific hyperparameters like learning rate, batch size, or detailed optimizer settings for the test-time training.