Unsupervised Discovery of Object Radiance Fields
Authors: Hong-Xing Yu, Leonidas Guibas, Jiajun Wu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate u ORF on factorized scene representation learning (e.g., segmentation in 3D) and scene generation (e.g., novel view synthesis, scene editing in 3D). Our evaluation is on three datasets with a gradually increasing complexity: first, CLEVR-like scenes with primitives foreground shapes; second, room scenes with complex chair shapes and textured backgrounds; third, more diverse room scenes with various foreground shapes and backgrounds. Our results show that u ORF learns factorized representations that can segment 3D scenes into objects with fine shape details (e.g., thin chair legs) and backgrounds with well-recovered appearance details (e.g., irregular textures of a wooden floor). |
| Researcher Affiliation | Academia | Hong-Xing Yu Stanford University Leonidas J. Guibas Stanford University Jiajun Wu Stanford University |
| Pseudocode | Yes | We show pseudo-code of our background-aware slot attention in Appendix (Alg. 1). |
| Open Source Code | Yes | *Code and data can be found at https://kovenyu.com/u ORF/. To ensure reproducibility of our work, we have provided the training and test code repository , together with all three synthetic datasets, and pre-trained models on all three datasets. |
| Open Datasets | Yes | CLEVR-567. The first dataset includes scenes of 5 7 CLEVR objects (Johnson et al., 2017)... Room-Diverse. The third dataset includes scenes of diverse foreground object shapes and background appearances... whose shape is randomly sampled from 1,200 Shape Net chair shapes (Chang et al., 2015)... To ensure reproducibility of our work, we have provided the training and test code repository , together with all three synthetic datasets, and pre-trained models on all three datasets. |
| Dataset Splits | No | The paper specifies training and testing sets, but does not explicitly mention a dedicated 'validation' set or its split percentages/counts for model tuning. |
| Hardware Specification | Yes | Our model is trained on a single Nvidia RTX 3090 GPU for about 6 days. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'VGG16', and 'Style GAN2' but does not provide specific version numbers for these or other libraries/frameworks. |
| Experiment Setup | Yes | We set λpercept = 0.006, λadv = 0.01, λR = 10. For coarse training, we bilinearly downsample supervision images to 64 64. The coarse training lasts for 600K iterations. For fine training, we randomly crop 64 64 patches from 128 128 images. The fine training lasts for 600K iterations. For all networks except discriminator, we use Adam optimizer with learning rate 0.0003, β1 = 0.9 and β2 = 0.999. Learning rate is exponentially decreased by half for every 200K iterations until after 600K iterations. We also adopt the learning rate warm-up from the slot attention paper (Locatello et al., 2020) for the first 1K iterations. We render each pixel with 64 samples. |