Novel View Synthesis with Diffusion Models

Authors: Daniel Watson, William Chan, Ricardo Martin Brualla, Jonathan Ho, Andrea Tagliasacchi, Mohammad Norouzi

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark 3Di Ms on the SRN Shape Net dataset (Sitzmann et al., 2019) to allow comparisons with prior work on novel view synthesis from a single image.
Researcher Affiliation Industry Daniel Watson William Chan Ricardo Martin-Brualla Google Research, Brain Google Research, Brain Google Research Jonathan Ho Andrea Tagliasacchi Mohammad Norouzi Google Research, Brain Google Research, Brain Google Research, Brain
Pseudocode Yes In order to maximize the reproducibility of our results, we provide code in JAX (Bradbury et al., 2018) for our proposed X-UNet neural architecture from Section 2.3.
Open Source Code Yes In order to maximize the reproducibility of our results, we provide code in JAX (Bradbury et al., 2018) for our proposed X-UNet neural architecture from Section 2.3.
Open Datasets Yes We benchmark 3Di Ms on the SRN Shape Net dataset (Sitzmann et al., 2019) to allow comparisons with prior work on novel view synthesis from a single image.
Dataset Splits No No explicit training/validation/test dataset splits with percentages or sample counts are provided for the main model training.
Hardware Specification Yes we could not fit ch=512 in TPUv4 memory without model parallelism
Software Dependencies No In order to maximize the reproducibility of our results, we provide code in JAX (Bradbury et al., 2018) for our proposed X-UNet neural architecture from Section 2.3.
Experiment Setup Yes For our neural architecture, our main experiments use ch=256 ( 471M params), and we also experiment with ch=448 ( 1.3B params) in Section 4. One of our early findings that we kept throughout all experiments in the paper is that ch_mult=(1, 2, 2, 4)... We use a learning rate with peak value 0.0001... We use a global batch size of 128.