ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields

Authors: Jiahua Dong, Yu-Xiong Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Vi CA-Ne RF provides more flexible, efficient (3 times faster) editing with higher levels of consistency and details, compared with the state of the art. Our code is available at: https://github.com/Dongjiahua/VICA-Ne RF. We conduct experiments on various scenes and text prompts. All our experiments are based on real scenes with Ne RFStudio [9]. We first show some qualitative results and comparisons between our method and Instruct-Ne RF2Ne RF [2]. For artistic stylization, we test our method with the cases from Ne RF-Art [34]. Experiments show that we achieve more detailed edits. Based on these scenes, we further conduct ablation studies on our method, including the effects of different components in the framework, the warm-up strategy, failure cases, and representative hyperparameters. We also conduct quantitative evaluation of the results, testing the textual alignment, consistency, and efficiency of our method.
Researcher Affiliation Academia Jiahua Dong Yu-Xiong Wang University of Illinois Urbana-Champaign {jiahuad2, yxw}@illinois.edu
Pseudocode No The paper describes procedures in text and uses figures to illustrate concepts, but it does not include a dedicated pseudocode block or algorithm listing.
Open Source Code Yes Our code is available at: https://github.com/Dongjiahua/VICA-Ne RF.
Open Datasets No The paper states: 'All our experiments are based on real scenes with Ne RFStudio [9].' and 'Our method is evaluated on various scenes and text prompts, ranging from faces to outdoor scenes.' While Ne RFStudio is a known framework, the specific 'real scenes' used for training are not identified as publicly available datasets with links or formal citations. Therefore, concrete access information to the specific training data used is not provided.
Dataset Splits No The paper mentions 'pre-train the model for 30,000 iterations' and 'continue to train the model' and describes 'warm-up' and 'post-refinement' stages, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes We analyze the time cost of the Face scene and compare it with Instruct-Ne RF2Ne RF on an RTX 4090 GPU.
Software Dependencies No The paper mentions using 'nerfacto model from Ne RFStudio' and 'Instruct-Pix2Pix [3] as our 2D editing model' and 'the diffusion model', but it does not provide specific version numbers for these software packages or any underlying libraries (e.g., Python, PyTorch versions).
Experiment Setup Yes For training the Ne RF, we use an L1 and LPIPS (Learned Perceptual Image Patch Similarity) loss throughout the process. Initially, we pre-train the model for 30,000 iterations following the nerfacto configuration. Subsequently, we continue to train the model using our proposed method. The optional post-refinement process occurs at 35,000 iterations. We obtain the final results at 40,000 iterations. During the editing of key views, the input timestep t is set within the range [0.5, 0.9]. We employ 10 diffusion steps for this phase. In the blending refinement model, we set t to 0.6 and nr to 5, and use only 3 diffusion steps. The diffusion model provides additional adjustable hyperparameters, such as the image guidance scale SI and the text guidance scale ST. We adopt the default values of SI = 1.5 and ST = 7.5 from the model without manual adjustment.