reproducibilityindex.ai

WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

Authors: Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate Wild Fusion on multiple image generation benchmarks, including Image Net, and find that it outperforms recent state-of-the-art GAN-based methods. We provide ablation studies in Sec. 4.3.
Researcher Affiliation	Collaboration	1University of Tübingen, 2NVIDIA, 3Vector Institute, 4University of Toronto
Pseudocode	No	The paper provides detailed descriptions of its models and algorithms, including mathematical equations, but does not include any formally labeled pseudocode or algorithm blocks/figures.
Open Source Code	No	The paper states 'See https://katjaschwarz.github.io/wildfusion/ for videos of our 3D results.' which is a project page, not a direct code repository. It also mentions building on other open-source projects, but not releasing its own code.
Open Datasets	Yes	Hence, we use non-aligned datasets with complex geometry: SDIP Dogs, Elephants, Horses (Mokady et al., 2022; Yu et al., 2015) as well as class-conditional Image Net (Deng et al., 2009).
Dataset Splits	No	For the autoencoder, we measure reconstruction via learned perceptual image patch similarity (LPIPS) (Zhan et al., 2018) and quantify novel view quality with Fréchet Inception Distance (nv FID) (Heusel et al., 2017) on 1000 held-out dataset images. All evaluations on held-out test set. While evaluation is done on held-out images, the paper does not specify the train/validation splits or percentages for the main dataset used for training.
Hardware Specification	Yes	We train all autoencoders with a batch size of 32 on 8 NVIDIA A100-PCIE-40GB GPUs until the discriminator has seen around 5.5M training images. Our LDMs are trained on 4 NVIDIA A100-PCIE-40GB GPUs for 8 hours on SDIP elephant and for 1 day on SDIP horse, dog.
Software Dependencies	No	Our code base builds on the official Py Torch implementation of Style GAN (Karras et al., 2019) available at https://github.com/NVlabs/stylegan3, EG3D (Chan et al., 2022) available at https:// github.com/NVlabs/eg3d and LDM (Rombach et al., 2021) available at https://github.com/Comp Vis/ latent-diffusion. While PyTorch is mentioned, no specific version is provided for PyTorch or any other software dependency.
Experiment Setup	Yes	The autoencoder uses Adam (Kingma & Ba, 2015) with a learning rate of 1.4 10 4. ... We train all autoencoders with a batch size of 32 ... We provide detailed model and training hyperparameter choices in Table 6.