StyleGAN knows Normal, Depth, Albedo, and More

Authors: Anand Bhattad, Daniel McKee, Derek Hoiem, David Forsyth

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper demonstrates that Style GAN can easily be induced to produce intrinsic images. The intrinsic images obtained from Style GAN compare well both qualitatively and quantitatively with those obtained by using SOTA image regression techniques; but Style GAN s intrinsic images are robust to relighting effects, unlike SOTA methods. As Section 5 shows, the intrinsic images recovered compare very well to those produced by robust image regression methods [28, 10, 18, 20], both qualitatively and quantitatively.
Researcher Affiliation Academia Anand Bhattad Daniel Mc Kee Derek Hoiem D.A. Forsyth University of Illinois Urbana Champaign
Pseudocode No The paper includes diagrams and descriptions of the approach (e.g., Figure 2) but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper provides a project URL (https://anandbhattad.github.io/stylegan knows/) in the author list, but it does not contain an unambiguous sentence stating that the authors are releasing the code for the work described in the paper, nor does it directly link to a code repository within the text.
Open Datasets Yes We use a pretrained model from Yu et al. [66] and that remains unaltered during the entire latent search process. Off-the-shelf networks are exclusively utilized for latent discovery, not for any part of the Style GAN training. We use a recent SOTA, self-supervised image decomposition model [20] on the IIW dataset [9] to guide our search for albedo and shading latents. We employ a supervised SOTA model, EVA-2 [18] and the top-performing segmentation method on the ADE20k benchmark [73] to guide our search for segmentation latents.
Dataset Splits No The paper mentions using '2000 unique scenes or generated images' to find directions and evaluating on '214 test scenes', but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for its own experimental process.
Hardware Specification Yes Overall time to find one intrinsic image direction is less than 2 minutes on an A40 GPU. In total, less than 24 hours of a single A40 GPU were required for the final reported experiments, and less than 200 hours of a single A40 GPU were required from ideation to final experiments.
Software Dependencies No The paper refers to various models and frameworks like 'Style GAN', 'Omnidata-v2', 'Zoe-depth', and 'EVA-2', but it does not specify software dependencies with version numbers (e.g., Python version, specific library versions).
Experiment Setup Yes We directly search for specific perturbations or offsets, denoted as d(c), which when added to the intermediate latent code w+, i.e., w+ = w+ + d(c), yield the desired intrinsic scene properties. To search for these offsets, we utilize off-the-shelf pretrained networks from Omnidata-v2 [28] for surface normals, Zoe-depth [10] for depth, EVA-2 [18] for semantic segmentation, and Paradigms for intrinsic image decomposition [20] to compute the desired scene properties for the generated image x = G(z), We employ a Huber-loss (smooth L1-Loss) to measure the difference between generated intrinsic and off-the-shelf network s predicted intrinsics. We found that incorporating task-specific losses, such as angular loss for normal prediction, slightly improves the quality of predicted intrinsic images.