Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Slot-guided Volumetric Object Radiance Fields

Authors: DI QI, Tong Yang, Xiangyu Zhang

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach by showing top results in scene decomposition and generation tasks of complex synthetic datasets (e.g., Room-Diverse). Furthermore, we also confirm the potential of s VORF to segment objects in real-world scenes (e.g., the LLFF dataset).
Researcher Affiliation Industry Di Qi MEGVII Technology Inc. EMAIL Tong Yang MEGVII Technology Inc. EMAIL Xiangyu Zhang MEGVII Technology Inc. EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for open-source code.
Open Datasets Yes Datasets Following u ORF [8], we experiment on several datasets in increasing order of complexity. CLEVR-567 [8]: The CLEVR [42] dataset is a widely used benchmark for evaluating object decomposition in computer vision. CLEVR-3D [35]: This dataset is also a variant of CLEVR dataset... Room-Chair [8]: This dataset contains 1,000 scenes... Room-Diverse [8]: This dataset is an upgraded Room-Chair... Multi Shape Net (MSN) [35]: This dataset comprises 11,733 distinct shapes... Local Light Field Fusion (LLFF) [36]: This dataset includes real scene scenarios...
Dataset Splits No The paper mentions "validation set" once for CLEVR-3D, stating "test on the first 320 scenes of each validation set", which implies it's used for testing rather than a separate validation split for hyperparameter tuning. For other datasets, only train/test splits are specified, without a distinct validation set.
Hardware Specification Yes For all three methods, we train models on the CLEVR-567 dataset using a batch size of 1 on V100... for the CLEVR-567 and Room-Chair datasets, we train s VORF for approximately 7 hours using 8 Nvidia RTX 2080 Ti GPUs with batch size 16. The u ORF and COLF models are trained on an Nvidia RTX V100 GPU... For the CLEVR3D dataset, s VORF is trained for approximately 2 days using 8 Nvidia RTX V100 GPUs with batch size 16, while OSRT is trained for approximately 1 day on 8 A100 GPUs with a batch size of 256.
Software Dependencies No The paper mentions using specific model architectures like "Res Net34 [48]" and "VIT-Base [47]" as backbones, and the "Adam optimizer", but it does not specify any software dependencies (e.g., Python, PyTorch, TensorFlow) with version numbers that would allow for replication.
Experiment Setup Yes We utilize the Adam optimizer with a learning rate of 0.0001, β1 = 0.9, and β2 = 0.999. Additionally, we implement learning rate warm-up for the initial 1,000 iterations. The minimum number K of objects in each scene is customized separately as follows: K = 8, 7, 5, 5, 2, and 5 for the CLEVR-567, CLEVR-3D, Room-Chair, Room-Diverse, LLFF, and MSN datasets, respectively. To allow training on a high resolution, such as 256 256, we render individual pixels instead of large-sized patches. Specifically, we randomly sample a batch of 64 rays from the set of all pixels in the dataset, and then follow the hierarchical volume sampling [4] to query 64 samples from the coarse network and 128 samples from the fine network.