StyleGAN knows Normal, Depth, Albedo, and More
Authors: Anand Bhattad, Daniel McKee, Derek Hoiem, David Forsyth
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper demonstrates that Style GAN can easily be induced to produce intrinsic images. The intrinsic images obtained from Style GAN compare well both qualitatively and quantitatively with those obtained by using SOTA image regression techniques; but Style GAN s intrinsic images are robust to relighting effects, unlike SOTA methods. As Section 5 shows, the intrinsic images recovered compare very well to those produced by robust image regression methods [28, 10, 18, 20], both qualitatively and quantitatively. |
| Researcher Affiliation | Academia | Anand Bhattad Daniel Mc Kee Derek Hoiem D.A. Forsyth University of Illinois Urbana Champaign |
| Pseudocode | No | The paper includes diagrams and descriptions of the approach (e.g., Figure 2) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project URL (https://anandbhattad.github.io/stylegan knows/) in the author list, but it does not contain an unambiguous sentence stating that the authors are releasing the code for the work described in the paper, nor does it directly link to a code repository within the text. |
| Open Datasets | Yes | We use a pretrained model from Yu et al. [66] and that remains unaltered during the entire latent search process. Off-the-shelf networks are exclusively utilized for latent discovery, not for any part of the Style GAN training. We use a recent SOTA, self-supervised image decomposition model [20] on the IIW dataset [9] to guide our search for albedo and shading latents. We employ a supervised SOTA model, EVA-2 [18] and the top-performing segmentation method on the ADE20k benchmark [73] to guide our search for segmentation latents. |
| Dataset Splits | No | The paper mentions using '2000 unique scenes or generated images' to find directions and evaluating on '214 test scenes', but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for its own experimental process. |
| Hardware Specification | Yes | Overall time to find one intrinsic image direction is less than 2 minutes on an A40 GPU. In total, less than 24 hours of a single A40 GPU were required for the final reported experiments, and less than 200 hours of a single A40 GPU were required from ideation to final experiments. |
| Software Dependencies | No | The paper refers to various models and frameworks like 'Style GAN', 'Omnidata-v2', 'Zoe-depth', and 'EVA-2', but it does not specify software dependencies with version numbers (e.g., Python version, specific library versions). |
| Experiment Setup | Yes | We directly search for specific perturbations or offsets, denoted as d(c), which when added to the intermediate latent code w+, i.e., w+ = w+ + d(c), yield the desired intrinsic scene properties. To search for these offsets, we utilize off-the-shelf pretrained networks from Omnidata-v2 [28] for surface normals, Zoe-depth [10] for depth, EVA-2 [18] for semantic segmentation, and Paradigms for intrinsic image decomposition [20] to compute the desired scene properties for the generated image x = G(z), We employ a Huber-loss (smooth L1-Loss) to measure the difference between generated intrinsic and off-the-shelf network s predicted intrinsics. We found that incorporating task-specific losses, such as angular loss for normal prediction, slightly improves the quality of predicted intrinsic images. |