Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

Authors: Zifan Shi, Yinghao Xu, Yujun Shen, Deli Zhao, Qifeng Chen, Dit-Yan Yeung

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various generator architectures and training datasets verify the superiority of Geo D over state-of-the-art alternatives. Moreover, our approach is registered as a general framework such that a more capable discriminator (i.e., with a third task of novel view synthesis beyond domain classification and geometry extraction) can further assist the generator with a better multi-view consistency.
Researcher Affiliation Collaboration Zifan Shi1 Yinghao Xu2 Yujun Shen3 Deli Zhao3 Qifeng Chen1 Dit-Yan Yeung1 1HKUST 2CUHK 3Ant Group
Pseudocode No The paper provides mathematical formulas and descriptions of methods but does not include structured pseudocode or algorithm blocks.
Open Source Code No Code will be made publicly available later.
Open Datasets Yes We evaluate the proposed Geo D on three real-world unstructured datasets, including FFHQ [14], AFHQ cat [4], and LSUN bedroom [35]. FFHQ contains unique 70K high-resolution real images of human faces. All images are aligned and cropped following [14]. AFHQ cat includes around 5K images of various cat faces in different poses. We use the cat face detector to get the landmarks and align images following [14]. Images are then cropped to keep the face in the center. There are about 3M images in the LSUN bedroom. The images are captured in various camera views. We use center-cropping to preprocess the images.
Dataset Splits No The paper describes dataset usage and evaluation metrics but does not provide specific train/validation/test dataset splits (e.g., percentages or counts) or reference predefined splits for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions various models and frameworks (e.g., NeRF, Unsup3d, IBRNet) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow, CUDA versions).
Experiment Setup Yes We follow the training protocol of the baselines. For human and cat faces, Geo D is trained from scratch along with the original GAN pipeline. As a result of the complexity of indoor scenes, it is difficult to learn reasonable geometries from monocular images only. Therefore, for Geo D of scenes, we pretrain the geometry branch on synthetic data [18] and NYU dataset [21] following Li et al. [18]. The resolution of the geometry branch in Geo D is 64 64 for faces and 256 256 for scenes. Images are resized to satisfy this requirement. The entire training ensures the discriminator to see 25000K real images. Due to the expensive rendering process, π-GAN is trained with images of resolution 64 64, and Volume GAN on LSUN bedroom is trained on 128 128 images. Other experiments are conducted on the resolution of 256 256. More details are available in Supplementary Material.