Slot-guided Volumetric Object Radiance Fields
Authors: DI QI, Tong Yang, Xiangyu Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach by showing top results in scene decomposition and generation tasks of complex synthetic datasets (e.g., Room-Diverse). Furthermore, we also confirm the potential of s VORF to segment objects in real-world scenes (e.g., the LLFF dataset). |
| Researcher Affiliation | Industry | Di Qi MEGVII Technology Inc. qidi@megvii.com Tong Yang MEGVII Technology Inc. yangtong@megvii.com Xiangyu Zhang MEGVII Technology Inc. zhangxiangyu@megvii.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for open-source code. |
| Open Datasets | Yes | Datasets Following u ORF [8], we experiment on several datasets in increasing order of complexity. CLEVR-567 [8]: The CLEVR [42] dataset is a widely used benchmark for evaluating object decomposition in computer vision. CLEVR-3D [35]: This dataset is also a variant of CLEVR dataset... Room-Chair [8]: This dataset contains 1,000 scenes... Room-Diverse [8]: This dataset is an upgraded Room-Chair... Multi Shape Net (MSN) [35]: This dataset comprises 11,733 distinct shapes... Local Light Field Fusion (LLFF) [36]: This dataset includes real scene scenarios... |
| Dataset Splits | No | The paper mentions "validation set" once for CLEVR-3D, stating "test on the first 320 scenes of each validation set", which implies it's used for testing rather than a separate validation split for hyperparameter tuning. For other datasets, only train/test splits are specified, without a distinct validation set. |
| Hardware Specification | Yes | For all three methods, we train models on the CLEVR-567 dataset using a batch size of 1 on V100... for the CLEVR-567 and Room-Chair datasets, we train s VORF for approximately 7 hours using 8 Nvidia RTX 2080 Ti GPUs with batch size 16. The u ORF and COLF models are trained on an Nvidia RTX V100 GPU... For the CLEVR3D dataset, s VORF is trained for approximately 2 days using 8 Nvidia RTX V100 GPUs with batch size 16, while OSRT is trained for approximately 1 day on 8 A100 GPUs with a batch size of 256. |
| Software Dependencies | No | The paper mentions using specific model architectures like "Res Net34 [48]" and "VIT-Base [47]" as backbones, and the "Adam optimizer", but it does not specify any software dependencies (e.g., Python, PyTorch, TensorFlow) with version numbers that would allow for replication. |
| Experiment Setup | Yes | We utilize the Adam optimizer with a learning rate of 0.0001, β1 = 0.9, and β2 = 0.999. Additionally, we implement learning rate warm-up for the initial 1,000 iterations. The minimum number K of objects in each scene is customized separately as follows: K = 8, 7, 5, 5, 2, and 5 for the CLEVR-567, CLEVR-3D, Room-Chair, Room-Diverse, LLFF, and MSN datasets, respectively. To allow training on a high resolution, such as 256 256, we render individual pixels instead of large-sized patches. Specifically, we randomly sample a batch of 64 rays from the set of all pixels in the dataset, and then follow the hierarchical volume sampling [4] to query 64 samples from the coarse network and 128 samples from the fine network. |