Multistable Shape from Shading Emerges from Patch Diffusion
Authors: Xinran Han, Todd Zickler, Ko Nishino
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train a small denoising diffusion process to generate surface normal fields from 16 16 patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency constraints. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ambiguous test images that humans experience as being multistable. |
| Researcher Affiliation | Academia | Xinran Nicole Han Harvard University xinranhan@g.harvard.edu Todd Zickler Harvard University zickler@seas.harvard.edu Ko Nishino Kyoto University kon@i.kyoto-u.ac.jp |
| Pseudocode | Yes | We provide the pseudocode for the single-scale spatial consistency guided sampling (Alg. 1) and the lighting consistency guidance (Alg. 2). Here, we have hyperparameters λ for weighting the smoothness and integrability loss, ηt as guidance update weight and Jt as the number of noise update steps. The results in our paper use λ = 0.5 and Jt = 3. The parameter ηt is resolution-dependent and is included with the schedule specification in Appendix A.10. |
| Open Source Code | Yes | We include in the supplemental material the implementation of the main algorithms in the paper. |
| Open Datasets | Yes | We train the pixel-space conditional diffusion model on a dataset that we build from the UniPS dataset [26]. It contains about 8000 256 256 synthetic images of 400 unique objects from the Adobe3D Assets [1] rendered from different viewing directions. |
| Dataset Splits | No | The paper does not explicitly state validation splits or sample counts for validation, only training and test. |
| Hardware Specification | Yes | It takes about 40 hours using one Nvidia A100 GPU. ... Runtime (seconds) 105s (single Quadro RTX 8000) 125s (single Quadro RTX 8000) |
| Software Dependencies | No | The paper mentions 'UNet', 'Adam W optimizer', and 'cosine variance schedule' but does not specify their version numbers or other ancillary software with versions. |
| Experiment Setup | Yes | We train it using patches of size d d extracted from rendered images of the 3D objects in [26] curated from Adobe Stock. We use Lambertian shading from random light directions, with a random albedo in [0.5, 1] and without cast shadows or global illumination effects. ... At inference time, we use the DDIM sampler [48] with 50 sampling steps and with guidance. ... The model is trained using the Adam W optimizer for 500 epochs with learning rate 2e-4. |