Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cameras as Rays: Pose Estimation via Ray Diffusion
Authors: Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed methods, both regression and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures. |
| Researcher Affiliation | Academia | Jason Y. Zhang , Amy Lin , Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani Carnegie Mellon University |
| Pseudocode | No | The paper includes figures and equations but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project Page: https://jasonyzhang.com/RayDiffusion. (This project page links to the GitHub repository https://github.com/jasonyzhang/ray-diffusion). |
| Open Datasets | Yes | Our method is trained and evaluated using CO3Dv2 (Reizenstein et al., 2021). |
| Dataset Splits | No | The paper mentions training on 41 categories and holding out 10 for generalization, and evaluating by randomly sampling N images from test sequences. However, it does not specify a train/validation/test split for the dataset in terms of percentages or sample counts for reproducibility within the dataset itself. |
| Hardware Specification | Yes | The ray regression and ray diffusion models take about 2 and 4 days respectively to train on 8 A6000 GPUs. All benchmarks are completed using a single Nvidia A6000 GPU. |
| Software Dependencies | No | The paper mentions using 'pre-trained, frozen DINOv2 (S/14)' and 'Di T with 16 transformer blocks' but does not specify versions for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Following Lin et al. (2024), we place the world origin at the point closest to the optical axes of the training cameras, which represents a useful inductive bias for center-facing camera setups. We use a Di T (Peebles & Xie, 2023) with 16 transformer blocks as the architecture for both f Regress (with t always set to 100) and f Diffusion. We train our diffusion model with T=100 timesteps. For all experiments, we use the X0 predicted at T = 30. |