WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

Authors: Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate Wild Fusion on multiple image generation benchmarks, including Image Net, and find that it outperforms recent state-of-the-art GAN-based methods. We provide ablation studies in Sec. 4.3.
Researcher Affiliation Collaboration 1University of Tübingen, 2NVIDIA, 3Vector Institute, 4University of Toronto
Pseudocode No The paper provides detailed descriptions of its models and algorithms, including mathematical equations, but does not include any formally labeled pseudocode or algorithm blocks/figures.
Open Source Code No The paper states 'See https://katjaschwarz.github.io/wildfusion/ for videos of our 3D results.' which is a project page, not a direct code repository. It also mentions building on other open-source projects, but not releasing its own code.
Open Datasets Yes Hence, we use non-aligned datasets with complex geometry: SDIP Dogs, Elephants, Horses (Mokady et al., 2022; Yu et al., 2015) as well as class-conditional Image Net (Deng et al., 2009).
Dataset Splits No For the autoencoder, we measure reconstruction via learned perceptual image patch similarity (LPIPS) (Zhan et al., 2018) and quantify novel view quality with Fréchet Inception Distance (nv FID) (Heusel et al., 2017) on 1000 held-out dataset images. All evaluations on held-out test set. While evaluation is done on held-out images, the paper does not specify the train/validation splits or percentages for the main dataset used for training.
Hardware Specification Yes We train all autoencoders with a batch size of 32 on 8 NVIDIA A100-PCIE-40GB GPUs until the discriminator has seen around 5.5M training images. Our LDMs are trained on 4 NVIDIA A100-PCIE-40GB GPUs for 8 hours on SDIP elephant and for 1 day on SDIP horse, dog.
Software Dependencies No Our code base builds on the official Py Torch implementation of Style GAN (Karras et al., 2019) available at https://github.com/NVlabs/stylegan3, EG3D (Chan et al., 2022) available at https:// github.com/NVlabs/eg3d and LDM (Rombach et al., 2021) available at https://github.com/Comp Vis/ latent-diffusion. While PyTorch is mentioned, no specific version is provided for PyTorch or any other software dependency.
Experiment Setup Yes The autoencoder uses Adam (Kingma & Ba, 2015) with a learning rate of 1.4 10 4. ... We train all autoencoders with a batch size of 32 ... We provide detailed model and training hyperparameter choices in Table 6.