Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild

Authors: Morris Alper, David Novotny, Filippos Kokkinos, Hadar Averbuch-Elor, Tom Monnier

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We exhaustively compare our method on standard NVS benchmarks and demonstrate superior performance, while training on fewer curated data sources than prior works. ... NVS metrics are provided in Table 1, comparing to the recent SOTA Mega Scenes NVS model (MS NVS) along with prior models reproduced from [Tung et al., 2024].
Researcher Affiliation Collaboration Morris Alper1,2 David Novotny2 Filippos Kokkinos2 Hadar Averbuch-Elor3 Tom Monnier2 1Tel Aviv University 2Meta AI 3Cornell University
Pseudocode No The paper describes the method and architecture in Section 3, titled 'Wild CAT3D', using prose and block diagrams like Figure 2, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Answer: [No] Justification: While we do not release our code or model weights, we provide full experimental details to enable reproduction of our results.
Open Datasets Yes Our training data sources are all licensed under permissive licenses: Re10K and Megascenes under CC BY 4.0, and CO3D under CC BY-NC 4.0.
Dataset Splits Yes To calculate generative metrics (FID, KID) on Mega Scenes, we use a random 15K-item subset of the test set to make their calculation computationally feasible.
Hardware Specification Yes For all model training, we use batch size 64 (where each sample in a mini-batch is itself a set of eight scene views), distributed over 32 NVIDIA A100 GPUs (using approximately 80GB of memory on each) on our internal cluster.
Software Dependencies No For Wild CAT3D s pretrained image generative backbone, we use an open-source LDM similar to [Rombach et al., 2022]. We calculate Plรผcker raymaps as follows: given camera origin o and pixel p (vectors in world coordinates), its raw 6-dimensional coordinates are given by (d, o d), where d = d o is its displacement from the camera origin. While specific tools like Depth Anything and COLMAP are mentioned, their exact version numbers used in their implementation are not provided.
Experiment Setup Yes Our appearance module has the following architecture: It is made up of alternating convolutional layers (filter size 3, same padding) and 2 2 max pooling, with filter dimensions 16, 16, 16, 4, 2 respectively. ... By default we use v = 8 (slots for views), training with one observed and seven unobserved randomly-selected scene views. ... We train on 512 512 pixel resolution (64 64 latent resolution)... During inference, we generate images using CFG scale 3. ... For all model training, we use batch size 64... The initial CAT3D model... is trained for 200K iterations, followed by 60K Wild CAT3D fine-tuning iterations.