Multistable Shape from Shading Emerges from Patch Diffusion

Authors: Xinran Han, Todd Zickler, Ko Nishino

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train a small denoising diffusion process to generate surface normal fields from 16 16 patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency constraints. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ambiguous test images that humans experience as being multistable.
Researcher Affiliation Academia Xinran Nicole Han Harvard University xinranhan@g.harvard.edu Todd Zickler Harvard University zickler@seas.harvard.edu Ko Nishino Kyoto University kon@i.kyoto-u.ac.jp
Pseudocode Yes We provide the pseudocode for the single-scale spatial consistency guided sampling (Alg. 1) and the lighting consistency guidance (Alg. 2). Here, we have hyperparameters λ for weighting the smoothness and integrability loss, ηt as guidance update weight and Jt as the number of noise update steps. The results in our paper use λ = 0.5 and Jt = 3. The parameter ηt is resolution-dependent and is included with the schedule specification in Appendix A.10.
Open Source Code Yes We include in the supplemental material the implementation of the main algorithms in the paper.
Open Datasets Yes We train the pixel-space conditional diffusion model on a dataset that we build from the UniPS dataset [26]. It contains about 8000 256 256 synthetic images of 400 unique objects from the Adobe3D Assets [1] rendered from different viewing directions.
Dataset Splits No The paper does not explicitly state validation splits or sample counts for validation, only training and test.
Hardware Specification Yes It takes about 40 hours using one Nvidia A100 GPU. ... Runtime (seconds) 105s (single Quadro RTX 8000) 125s (single Quadro RTX 8000)
Software Dependencies No The paper mentions 'UNet', 'Adam W optimizer', and 'cosine variance schedule' but does not specify their version numbers or other ancillary software with versions.
Experiment Setup Yes We train it using patches of size d d extracted from rendered images of the 3D objects in [26] curated from Adobe Stock. We use Lambertian shading from random light directions, with a random albedo in [0.5, 1] and without cast shadows or global illumination effects. ... At inference time, we use the DDIM sampler [48] with 50 sampling steps and with guidance. ... The model is trained using the Adam W optimizer for 500 epochs with learning rate 2e-4.