VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
Authors: Katja Schwarz, Axel Sauer, Michael Niemeyer, Yiyi Liao, Andreas Geiger
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results demonstrate that monolithic MLPs can indeed be replaced by 3D convolutions when combining sparse voxel grids with progressive growing, free space pruning and appropriate regularization. To obtain a compact representation of the scene and allow for scaling to higher voxel resolutions, our model disentangles the foreground object (modeled in 3D) from the background (modeled in 2D). |
| Researcher Affiliation | Academia | Katja Schwarz1 Axel Sauer1 Michael Niemeyer1 Yiyi Liao2 Andreas Geiger1 1University of Tübingen and Max Planck Institute for Intelligent Systems, Tübingen 2 Zhejiang University, China |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code and models are available at https://github.com/autonomousvision/voxgraf. |
| Open Datasets | Yes | The synthetic Carla dataset [8, 37] contains 10k images and camera poses of 18 car models with randomly sampled colors. FFHQ [19] comprises 70k aligned face images. AFHQv2 Cats [5] consists of 4834 cat faces. |
| Dataset Splits | No | The paper mentions using "the full dataset" for FID evaluation and augmenting datasets, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) used for training the model itself. |
| Hardware Specification | Yes | Depending on the dataset, we train our models for 3 to 7 days on 8 Tesla V100 GPUs. For all runtime comparisons, we report times on a single Tesla V100 GPU with a batch size of 1. |
| Software Dependencies | No | The paper mentions using 'custom CUDA kernels', the 'Minkowski Engine library', and the 'Style GAN2 architecture', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We train our approach with Adam [21] using a batch size of 64 at grid resolution RG = 32, 64 and 32 at RG = 128. We use a learning rate of 0.0025 for the generator and 0.002 for the discriminator. |