Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

Authors: Suhani Vora, Noha Radwan, Klaus Greff, Henning Meyer, Kyle Genova, Mehdi S. M. Sajjadi, Etienne Pot, Andrea Tagliasacchi, Daniel Duckworth

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical analysis demonstrates comparable quality to competitive 2D and 3D semantic segmentation baselines on complex, realistically-rendered scenes and significantly outperforms a comparable neural radiance field-based method on a series of tasks requiring 3D reasoning.
Researcher Affiliation Industry Suhani Vora EMAIL Google Research; Noha Radwan EMAIL Google Research; Klaus Greff EMAIL Google Research; Henning Meyer EMAIL Google Research; Kyle Genova EMAIL Google Research; Mehdi S. M. Sajjadi EMAIL Google Research; Etienne Pot EMAIL Google Research; Andrea Tagliasacchi EMAIL Google Research, Simon Fraser University; Daniel Duckworth EMAIL Google Research
Pseudocode No The paper describes the methodology in detailed prose and through figures like Figure 2, which illustrates the architecture. However, it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No We release these datasets along with code to reproduce them to the community upon publication. ... We intend to release the code, datasets and pre-trained models upon publication.
Open Datasets Yes As large scale datasets of 3D semantically annotated scenes with sufficient high quality RGB views are scarce, we propose three novel datasets of increasing complexity: KLEVR, Toy Box5, and Toy Box13. ... We release these datasets along with code to reproduce them to the community upon publication. ... These datasets along with accompanying code and pretrained Ne RF models, are publicly available on our project website (to be linked upon publication).
Dataset Splits Yes To enable evaluation from novel views within the same scene, we randomly partition each scene s frames into train cameras and test cameras; the latter representing the set typically used to evaluate methods in novel view synthesis (Mildenhall et al., 2020). For evaluation across scenes, we further partition scenes into train scenes and novel scenes. Table 1 depicts the statistics for each of the proposed datasets. ... We train Ne RF models on all train cameras from all train scenes. We provide Ne SF with supervision from semantic maps corresponding to 9 randomly-chosen cameras per scene. ... For 2D evaluation, we randomly select 4 cameras from each novel scenes train cameras.
Hardware Specification Yes While neural radiance fields are acknowledged to be slow to train, we find that we are able to fit a single model to sufficient quality in 20 minutes on eight TPUv3 cores on the Google Cloud Platform. ... Our models are trained on 32 TPUv3 cores. ... We train Deep Lab v3 with Wide Res Net (Wu et al., 2019) for 55k steps on 16 TPUv3 chips. ... We train Sparse Conv Net asynchronously on 20 NVIDIA V100 GPUs with momentum using a base learning rate of 1.5e 2 and decaying to 0 over the final 250k steps of training.
Software Dependencies No The paper mentions several software components and frameworks used (e.g., Adam optimizer, Deep Lab v3, Sparse Conv Net, Ne RF, UNet, Kubric, Blender) but does not provide specific version numbers for any of these, which is required for a reproducible description of software dependencies.
Experiment Setup Yes Each scene is preprocessed by training an independent Ne RF for 25k steps with Adam using an initial learning rate of 1e 3 decaying to 5.4e 4 according to a cosine rule. ... Ne SF is trained for 5k steps using Adam with an initial learning rate of 1e 3 decaying to 4e 4. As input for Ne SF, we discretize density fields by densely probing with ϵ=1/32 resulting in 643 evenly-spaced points in [ 1, +1]3. This density grid is then processed by the 3D UNet architecture of Çiçek et al. (2016) with 32, 64, and 128 channels at each stage of downsampling. The semantic latent vector is processed by a multilayer perceptron consisting of 2 hidden layers of 128 units. ... We train Deep Lab v3 with Wide Res Net (Wu et al., 2019) for 55k steps on 16 TPUv3 chips. ... We train Sparse Conv Net asynchronously on 20 NVIDIA V100 GPUs with momentum using a base learning rate of 1.5e 2 and decaying to 0 over the final 250k steps of training.