Slot-guided Volumetric Object Radiance Fields

Authors: DI QI, Tong Yang, Xiangyu Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach by showing top results in scene decomposition and generation tasks of complex synthetic datasets (e.g., Room-Diverse). Furthermore, we also confirm the potential of s VORF to segment objects in real-world scenes (e.g., the LLFF dataset).
Researcher Affiliation Industry Di Qi MEGVII Technology Inc. qidi@megvii.com Tong Yang MEGVII Technology Inc. yangtong@megvii.com Xiangyu Zhang MEGVII Technology Inc. zhangxiangyu@megvii.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for open-source code.
Open Datasets Yes Datasets Following u ORF [8], we experiment on several datasets in increasing order of complexity. CLEVR-567 [8]: The CLEVR [42] dataset is a widely used benchmark for evaluating object decomposition in computer vision. CLEVR-3D [35]: This dataset is also a variant of CLEVR dataset... Room-Chair [8]: This dataset contains 1,000 scenes... Room-Diverse [8]: This dataset is an upgraded Room-Chair... Multi Shape Net (MSN) [35]: This dataset comprises 11,733 distinct shapes... Local Light Field Fusion (LLFF) [36]: This dataset includes real scene scenarios...
Dataset Splits No The paper mentions "validation set" once for CLEVR-3D, stating "test on the first 320 scenes of each validation set", which implies it's used for testing rather than a separate validation split for hyperparameter tuning. For other datasets, only train/test splits are specified, without a distinct validation set.
Hardware Specification Yes For all three methods, we train models on the CLEVR-567 dataset using a batch size of 1 on V100... for the CLEVR-567 and Room-Chair datasets, we train s VORF for approximately 7 hours using 8 Nvidia RTX 2080 Ti GPUs with batch size 16. The u ORF and COLF models are trained on an Nvidia RTX V100 GPU... For the CLEVR3D dataset, s VORF is trained for approximately 2 days using 8 Nvidia RTX V100 GPUs with batch size 16, while OSRT is trained for approximately 1 day on 8 A100 GPUs with a batch size of 256.
Software Dependencies No The paper mentions using specific model architectures like "Res Net34 [48]" and "VIT-Base [47]" as backbones, and the "Adam optimizer", but it does not specify any software dependencies (e.g., Python, PyTorch, TensorFlow) with version numbers that would allow for replication.
Experiment Setup Yes We utilize the Adam optimizer with a learning rate of 0.0001, β1 = 0.9, and β2 = 0.999. Additionally, we implement learning rate warm-up for the initial 1,000 iterations. The minimum number K of objects in each scene is customized separately as follows: K = 8, 7, 5, 5, 2, and 5 for the CLEVR-567, CLEVR-3D, Room-Chair, Room-Diverse, LLFF, and MSN datasets, respectively. To allow training on a high resolution, such as 256 256, we render individual pixels instead of large-sized patches. Specifically, we randomly sample a batch of 64 rays from the set of all pixels in the dataset, and then follow the hierarchical volume sampling [4] to query 64 samples from the coarse network and 128 samples from the fine network.