Decomposing NeRF for Editing via Feature Field Distillation
Authors: Sosuke Kobayashi, Eiichi Matsumoto, Vincent Sitzmann
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments validate that the distilled feature fields can transfer recent progress in 2D vision and language foundation models to 3D scene representations, enabling convincing 3D segmentation and selective editing of emerging neural graphics representations. In extensive experiments, we investigate the applications of neural feature fields with two different pre-trained teacher networks. |
| Researcher Affiliation | Collaboration | Sosuke Kobayashi Preferred Networks, Inc. sosk@preferred.jp Eiichi Matsumoto Preferred Networks, Inc. matsumoto@preferred.jp Vincent Sitzmann Massachusetts Institute of Technology sitzmann@mit.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The complete code for the reproduction of all the experimental results is not publicly available. |
| Open Datasets | Yes | We construct a 3D semantic segmentation benchmark from four scenes in the Replica dataset [86] with data split and posed images provided by [112]. |
| Dataset Splits | No | The paper mentions that 'data split and posed images provided by [112]' were used, but it does not specify the exact percentages or counts for training, validation, or test splits. No explicit validation set details are provided. |
| Hardware Specification | No | The paper states: 'It is difficult to completely track and sum the total amount of computing in the experiments. Instead, we reported the setup of the main experiments.' It does not provide specific hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using specific teacher networks (LSeg [44], DINO [12]) and follows settings from another paper ([112]) for NeRF implementation, but it does not provide version numbers for any ancillary software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | During the training of 200K iterations, the loss L in Equation 4 is minimized by Adam with a linearly decaying learning rate (5e-4 to 8e-5). During training, Gaussian noise for density is also applied. The number of coarse and fine samplings is 64 and 128, respectively. The MLP of the neural radiance field consists of eight Re LU layers with 256 dimensions, followed by a linear layer for density, three layers for color, and three layers for feature, as shown in Fig. 1. Positional encoding of length 10 is used for the input coordinate and its skip connection, and that of length 4 is for viewing direction. The size of a training image is 320 240 for the Replica dataset and 1008 756 for the other datasets. The batchsize of training rays is 1024 for Replica and 2048 for the others. During finetuning of feature fields or radiance fields, Gaussian noise is removed, and the learning rate is set to 1e-4. See appendix A and C for further training details. |