WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space
Authors: Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate Wild Fusion on multiple image generation benchmarks, including Image Net, and find that it outperforms recent state-of-the-art GAN-based methods. We provide ablation studies in Sec. 4.3. |
| Researcher Affiliation | Collaboration | 1University of Tübingen, 2NVIDIA, 3Vector Institute, 4University of Toronto |
| Pseudocode | No | The paper provides detailed descriptions of its models and algorithms, including mathematical equations, but does not include any formally labeled pseudocode or algorithm blocks/figures. |
| Open Source Code | No | The paper states 'See https://katjaschwarz.github.io/wildfusion/ for videos of our 3D results.' which is a project page, not a direct code repository. It also mentions building on other open-source projects, but not releasing its own code. |
| Open Datasets | Yes | Hence, we use non-aligned datasets with complex geometry: SDIP Dogs, Elephants, Horses (Mokady et al., 2022; Yu et al., 2015) as well as class-conditional Image Net (Deng et al., 2009). |
| Dataset Splits | No | For the autoencoder, we measure reconstruction via learned perceptual image patch similarity (LPIPS) (Zhan et al., 2018) and quantify novel view quality with Fréchet Inception Distance (nv FID) (Heusel et al., 2017) on 1000 held-out dataset images. All evaluations on held-out test set. While evaluation is done on held-out images, the paper does not specify the train/validation splits or percentages for the main dataset used for training. |
| Hardware Specification | Yes | We train all autoencoders with a batch size of 32 on 8 NVIDIA A100-PCIE-40GB GPUs until the discriminator has seen around 5.5M training images. Our LDMs are trained on 4 NVIDIA A100-PCIE-40GB GPUs for 8 hours on SDIP elephant and for 1 day on SDIP horse, dog. |
| Software Dependencies | No | Our code base builds on the official Py Torch implementation of Style GAN (Karras et al., 2019) available at https://github.com/NVlabs/stylegan3, EG3D (Chan et al., 2022) available at https:// github.com/NVlabs/eg3d and LDM (Rombach et al., 2021) available at https://github.com/Comp Vis/ latent-diffusion. While PyTorch is mentioned, no specific version is provided for PyTorch or any other software dependency. |
| Experiment Setup | Yes | The autoencoder uses Adam (Kingma & Ba, 2015) with a learning rate of 1.4 10 4. ... We train all autoencoders with a batch size of 32 ... We provide detailed model and training hyperparameter choices in Table 6. |