reproducibilityindex.ai

Cameras as Rays: Pose Estimation via Ray Diffusion

Authors: Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed methods, both regression and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures.
Researcher Affiliation	Academia	Jason Y. Zhang , Amy Lin , Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani Carnegie Mellon University
Pseudocode	No	The paper includes figures and equations but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Project Page: https://jasonyzhang.com/RayDiffusion. (This project page links to the GitHub repository https://github.com/jasonyzhang/ray-diffusion).
Open Datasets	Yes	Our method is trained and evaluated using CO3Dv2 (Reizenstein et al., 2021).
Dataset Splits	No	The paper mentions training on 41 categories and holding out 10 for generalization, and evaluating by randomly sampling N images from test sequences. However, it does not specify a train/validation/test split for the dataset in terms of percentages or sample counts for reproducibility within the dataset itself.
Hardware Specification	Yes	The ray regression and ray diffusion models take about 2 and 4 days respectively to train on 8 A6000 GPUs. All benchmarks are completed using a single Nvidia A6000 GPU.
Software Dependencies	No	The paper mentions using 'pre-trained, frozen DINOv2 (S/14)' and 'Di T with 16 transformer blocks' but does not specify versions for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Following Lin et al. (2024), we place the world origin at the point closest to the optical axes of the training cameras, which represents a useful inductive bias for center-facing camera setups. We use a Di T (Peebles & Xie, 2023) with 16 transformer blocks as the architecture for both f Regress (with t always set to 100) and f Diffusion. We train our diffusion model with T=100 timesteps. For all experiments, we use the X0 predicted at T = 30.