Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Authors: Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, Jiaolong Yang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train our model on a large corpus of mixed datasets and conducted comprehensive evaluations, demonstrating its superior performance in achieving accurate relative geometry, precise metric scale, and fine-grained detail recovery capabilities that no previous methods have simultaneously achieved. |
| Researcher Affiliation | Collaboration | Ruicheng Wang1,2 Sicheng Xu2 Yue Dong2 Yu Deng2 Jianfeng Xiang3,2 Zelong Lv1,2 Guangzhong Sun1 Xin Tong2 Jiaolong Yang2 1USTC 2Microsoft Research 3Tsinghua University |
| Pseudocode | No | The paper describes the methodology in Section 3 and its subsections using prose, mathematical equations (e.g., Eq. 1, 2, 3, 4, 8, 9, 10, 11, 12, 13, 15), and architectural diagrams (e.g., Figure 2, Figure 3, Figure A.1), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and instructions are accessible via an anonymous link in the supplementary material. |
| Open Datasets | Yes | We train our model using a combination of 24 datasets with 16 synthetic datasets [10, 58, 49, 59, 42, 33, 14, 24, 83, 51, 63, 21, 1, 65, 55, 73], 3 Li DAR scanned datasets [17, 64, 53], and 5 Sf M-reconstructed datasets [3, 78, 70, 34, 69]. All datasets are publicly available for academic use, and their sampling weights follow the protocol established in Mo Ge [61]. |
| Dataset Splits | No | The paper lists numerous datasets used for training and evaluation in Table A.1 and Section 4.1. It states "We train our model using a combination of 24 datasets" and "We evaluate the accuracy of our method on 10 datasets." However, it does not explicitly provide the train/validation/test splits (e.g., specific percentages or sample counts) for its combined training data, nor does it detail how existing dataset splits were specifically utilized for its own training process. |
| Hardware Specification | Yes | The full model is trained for 120K iterations with 32 NVIDIA A100 GPUs for 120 hours. Ablation models are trained for 100K iterations. |
| Software Dependencies | No | The paper mentions using DINOv2 as the image encoder backbone and various datasets and models from related works, citing their respective papers. However, it does not explicitly list specific software dependencies with their version numbers (e.g., "PyTorch 1.9", "CUDA 11.1") within the provided text. |
| Experiment Setup | Yes | The models are trained with initial learning rates of 1 10 5 for the Vi T backbone and 1 10 4 for the neck and heads. The learning rate decays by half every 25K steps. The full model is trained for 120K iterations with 32 NVIDIA A100 GPUs for 120 hours. Ablation models are trained for 100K iterations. |