DreamSparse: Escaping from Plato’s Cave with 2D Diffusion Model Given Sparse Views
Authors: Paul Yoo, Jiaxian Guo, Yutaka Matsuo, Shixiang (Shane) Gu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. |
| Researcher Affiliation | Academia | Paul Yoo Jiaxian Guo Yutaka Matsuo Shixiang Shane Gu The University of Tokyo {paulyoo, jiaxian.guo}@weblab.t.u-tokyo.ac.jp |
| Pseudocode | No | The paper describes the architecture and processes in detail in text and figures (e.g., Figure 2 for the overall pipeline) but does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | More results can be found on our project page: https://sites.google.com/view/ dreamsparse-webpage. (Checked link, states 'Code (coming soon)') |
| Open Datasets | Yes | Following Sparse Fusion [76], we perform experiments on real-world scenes from the Common Objects in 3D (CO3Dv2) [37]...We train and evaluate our framework on the CO3Dv2 [37] dataset s fewview_train and fewview_dev sequence sets respectively. ... We additionally train and evaluate our method and baselines on the cars category of the Shape Net [5] synthetic dataset of object renderings. |
| Dataset Splits | Yes | We train and evaluate our framework on the CO3Dv2 [37] dataset s fewview_train and fewview_dev sequence sets respectively. ... For computing evaluation metrics, we select 10 objects per category and sample 32 uniformly spaced camera poses from the held-out test split. We then randomly select a specified number of context views from the camera poses and evaluate novel view synthesis results on the rest of the poses. |
| Hardware Specification | Yes | We jointly train the geometry and the spatial modules on 8 A100-40GB GPUs for 3 days with a batch size of 15. |
| Software Dependencies | Yes | We use Stable Diffusion v1.5 [42] as the frozen pre-trained diffusion model and DDIM [56] to synthesize novel views with 20 denoising steps. ... We use a Res Net50 [12] backbone ... We employ a Transformer [64]. |
| Experiment Setup | Yes | The resolutions of the feature map for the spatial guidance module and latent noise are set as 64 64 with spatial guidance weight λ = 2. The three transformers used in the geometry module all contain 4 layers... with a batch size of 15. To demonstrate our framework s generalization capability at object-level novel view synthesis, we trained our framework on a subset of 10 categories as specified in [37]. |