Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion
Authors: Bardienus Duisterhof, Jan Oberst, Bowen Wen, Stan Birchfield, Deva Ramanan, Jeffrey Ichnowski
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Ray St3R on synthetic and realworld datasets, and observe it achieves state-of-the-art performance, outperforming the baselines on all datasets by up to 44 % in 3D chamfer distance. Project page: rayst3r.github.io |
| Researcher Affiliation | Collaboration | Bardienus P. Duisterhof Carnegie Mellon University Jan Oberst Carnegie Mellon University Bowen Wen NVIDIA Stan Birchfield NVIDIA Deva Ramanan Carnegie Mellon University Jeffrey Ichnowski Carnegie Mellon University |
| Pseudocode | No | The paper describes the model architecture and training objectives in Section 4.1 and 4.2 using text and mathematical equations, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | Project page: rayst3r.github.io and in the reproducibility checklist "We also plan to release the code, model checkpoints and dataset of this paper, with useful instructions to help adoption." |
| Open Datasets | Yes | We leverage existing synthetic datasets from Foundation Pose [52] and Oct MAE [18]. For both datasets, we use the Objaverse and GSO meshes to render depth maps from novel views. Our dataset spans 251 k unique scenes, 12 k objects, and 11 M novel depth maps rendered for supervision. We evaluate Ray St3R on synthetic and realworld datasets. Following Oct MAE [18], we evaluate on subsets of evaluation splits of the YCB-Video [56] (900 frames), HOPE [42] (50 frames), and Homebrewed DB [21] (1,000 frames) datasets. |
| Dataset Splits | Yes | We evaluate on subsets of evaluation splits of the YCB-Video [56] (900 frames), HOPE [42] (50 frames), and Homebrewed DB [21] (1,000 frames) datasets. For results on synthetic data, we evaluate on evaluation split of the Oct MAE [18] (1,000 frames) dataset test split. |
| Hardware Specification | Yes | We train Ray St3R on 8 80-GB A100 GPUs for 18 epochs, totaling approximately 20 million scene iterations. Inference takes less than 1.2 seconds on a single RTX 4090 GPU, and can be further reduced by querying fewer views. |
| Software Dependencies | No | We use a Vi T-B model with patch size 16, embedding dimension 768, 12 heads, 12 cross-attention layers, but 4 self-attention layers to save on compute. We select the Vi T-L with registers for DINOv2 [33]. We use an Adam W optimizer [28]. No specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or programming languages are provided. |
| Experiment Setup | Yes | We set the batch size to 10 per GPU, and a learning rate of 1.5 10 4 with a half-cosine learning-rate schedule, starting with one warm-up epoch and using an Adam W optimizer [28]. We set λbb = 1.3 and λcam = 0.7 for all real-world datasets, and λbb = 2.5 and λcam = 1.2 for the Oct MAE dataset. ... We set the confidence threshold τ = 5 for all experiments, and sample 22 views in total. During training we set the confidence parameter α = 0.2, λmask = 0.1. |