Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multistable Shape from Shading Emerges from Patch Diffusion
Authors: Xinran Han, Todd Zickler, Ko Nishino
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train a small denoising diffusion process to generate surface normal fields from 16 16 patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency constraints. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ambiguous test images that humans experience as being multistable. |
| Researcher Affiliation | Academia | Xinran Nicole Han Harvard University EMAIL Todd Zickler Harvard University EMAIL Ko Nishino Kyoto University EMAIL |
| Pseudocode | Yes | We provide the pseudocode for the single-scale spatial consistency guided sampling (Alg. 1) and the lighting consistency guidance (Alg. 2). Here, we have hyperparameters λ for weighting the smoothness and integrability loss, ηt as guidance update weight and Jt as the number of noise update steps. The results in our paper use λ = 0.5 and Jt = 3. The parameter ηt is resolution-dependent and is included with the schedule specification in Appendix A.10. |
| Open Source Code | Yes | We include in the supplemental material the implementation of the main algorithms in the paper. |
| Open Datasets | Yes | We train the pixel-space conditional diffusion model on a dataset that we build from the UniPS dataset [26]. It contains about 8000 256 256 synthetic images of 400 unique objects from the Adobe3D Assets [1] rendered from different viewing directions. |
| Dataset Splits | No | The paper does not explicitly state validation splits or sample counts for validation, only training and test. |
| Hardware Specification | Yes | It takes about 40 hours using one Nvidia A100 GPU. ... Runtime (seconds) 105s (single Quadro RTX 8000) 125s (single Quadro RTX 8000) |
| Software Dependencies | No | The paper mentions 'UNet', 'Adam W optimizer', and 'cosine variance schedule' but does not specify their version numbers or other ancillary software with versions. |
| Experiment Setup | Yes | We train it using patches of size d d extracted from rendered images of the 3D objects in [26] curated from Adobe Stock. We use Lambertian shading from random light directions, with a random albedo in [0.5, 1] and without cast shadows or global illumination effects. ... At inference time, we use the DDIM sampler [48] with 50 sampling steps and with guidance. ... The model is trained using the Adam W optimizer for 500 epochs with learning rate 2e-4. |