Disentangled 3D Scene Generation with Layout Learning
Authors: Dave Epstein, Ben Poole, Ben Mildenhall, Alexei A Efros, Aleksander Holynski
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine the ability of layout learning to generate and disentangle 3D scenes across a wide range of text prompts. We first verify our method s effectiveness through an ablation study and comparison to baselines, and then demonstrate various applications enabled by layout learning. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, UC Berkeley 2Google Research. |
| Pseudocode | Yes | Figure 9: Pseudocode for layout learning, with segments inherited from previous work abstracted into functions. Figure 10: Pseudocode for empty Ne RF regularization, where soft_bin_acc computes αbin in Equation 5. |
| Open Source Code | No | The paper provides a project page link (https://dave.ml/layoutlearning/) but does not explicitly state that the source code for the described methodology is available there or elsewhere. |
| Open Datasets | No | The paper uses a pretrained text-to-image diffusion model (Imagen) and a list of prompts for generation and evaluation, but does not specify a publicly available dataset they used for training their own models from scratch or provide access information for any dataset explicitly used for their experiments beyond the prompts. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Mip-Ne RF 360, Imagen, and Shampoo, but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | We use λdist = 0.001, λacc = 0.01, λori = 0.01 as well as λempty = 0.05. We initialize parameters s N(1, 0.3), t(i) N(0, 0.3), and q(i) N(µi, 0.1) where µi is 1 for the last element and 0 for all others. We use a 10 higher learning rate to train layout parameters. We use a classifier-free guidance strength of 200 and textureless shading probability of 0.1 for SDS (Poole et al., 2022), disabling view-dependent prompting as it does not aid in the generation of compositional scenes (Table 3b). We optimize our model with Shampoo (Gupta et al., 2018) with a batch size of 1 for 15000 steps with an annealed learning rate, starting from 10 9, peaking at 10 4 after 3000 steps, and decaying to 10 6. |