Alias-Free Generative Adversarial Networks
Authors: Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 3: Results for FFHQ-U (unaligned FFHQ) at 2562. Left: Training configurations. FID is computed between 50k generated images and all training images [23, 28]; lower is better. EQ-T and EQ-R are our equivariance metrics in decibels (d B); higher is better. Right: Parameter ablations using our final configuration (R) for the filter s support, magnification around nonlinearities, and the minimum stopband frequency at the first layer. Figure 5: Left: Results for six datasets. We use adaptive discriminator augmentation (ADA) [28] for the smaller datasets. Style GAN2 corresponds to our baseline config B with Fourier features. Right: Ablations and comparisons for FFHQ-U (unaligned FFHQ) at 2562. |
| Researcher Affiliation | Collaboration | Tero Karras NVIDIA tkarras@nvidia.com Miika Aittala NVIDIA maittala@nvidia.com Samuli Laine NVIDIA slaine@nvidia.com Erik Härkönen Aalto University and NVIDIA erik.harkonen@aalto.fi Janne Hellsten NVIDIA jhellsten@nvidia.com Jaakko Lehtinen NVIDIA and Aalto University jlehtinen@nvidia.com Timo Aila NVIDIA taila@nvidia.com |
| Pseudocode | No | The paper includes diagrams of the generator architecture, but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation and pre-trained models are available at https://github.com/NVlabs/stylegan3 |
| Open Datasets | Yes | In addition to the standard FFHQ [29] and METFACES [28], we created unaligned versions of them. We also created a properly resampled version of AFHQ [14] and collected a new BEACHES dataset. |
| Dataset Splits | No | The paper mentions using datasets like FFHQ, METFACES, and AFHQ, but does not explicitly provide details about the train/validation/test splits used for these datasets in the context of their experiments. |
| Hardware Specification | Yes | This entire project consumed 92 GPU years and 225 MWh of electricity on an in-house cluster of NVIDIA V100s. |
| Software Dependencies | No | The paper references popular deep learning frameworks like TensorFlow [1] and PyTorch [39] and mentions implementing a custom CUDA kernel, but does not provide specific version numbers for these or other software dependencies used in their experiments. |
| Experiment Setup | Yes | Our contributions include the surprising finding that current upsampling filters are simply not aggressive enough in suppressing aliasing, and that extremely high-quality filters with over 100d B attenuation are required. Further, we present a principled solution to aliasing caused by pointwise nonlinearities [5] by considering their effect in the continuous domain and appropriately low-pass filtering the results. We also show that after the overhaul, a model based on 1 1 convolutions yields a strong, rotation equivariant generator. We use a windowed sinc filter with a relatively large Kaiser window [35] of size n = 6, meaning that each output pixel is affected by 6 input pixels in upsampling and each input pixel affects 6 output pixels in downsampling. In practice, we find m = 2 to be sufficient (Figure 3, right), again improving EQ-T (config F). Figure 5 gives results for six datasets using Style GAN2 [30] as well as our alias-free Style GAN3-T and Style GAN3-R generators. In addition to the standard FFHQ [29] and METFACES [28], we created unaligned versions of them. We also created a properly resampled version of AFHQ [14] and collected a new BEACHES dataset. Appendix B describes the datasets in detail. The results show that our FID remains competitive with Style GAN2. Style GAN3-T and Style GAN3-R perform equally well in terms of FID, and both show a very high level of translation equivariance. As expected, only the latter provides rotation equivariance. In FFHQ (1024 1024) the three generators had 30.0M, 22.3M and 15.8M parameters, while the training times were 1106, 1576 (+42%) and 2248 GPU hours. |