Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pro3D-Editor: A Progressive-Views Perspective for Consistent and Precise 3D Editing
Authors: Yang Zheng, Mengqi Huang, Nan Chen, Zhendong Mao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method outperforms existing methods in editing accuracy and spatial consistency. Project Page: https://shuoyueli4519.github.io/Pro3D-Editor. 4 Experiments |
| Researcher Affiliation | Academia | Yang Zheng1 Mengqi Huang1 Nan Chen1 Zhendong Mao1,2 1University of Science and Technology of China 2Institute of Artificial intelligence, Hefei Comprehensive National Science Center EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in detailed prose and accompanying figures (e.g., Figure 3: Method overview) but does not include any clearly labeled pseudocode or algorithm blocks with structured, code-like formatting. |
| Open Source Code | No | The code and validation dataset are not publicly available at the time of submission. Nevertheless, the paper provides detailed descriptions of the methods, training procedures, and data preparation steps to facilitate reproducibility. We intend to release the code and validation dataset in the future. |
| Open Datasets | No | Our evaluation 3D dataset contains 6 objects and 15 editing prompts. To construct the evaluation image dataset, we render 72 views for each edited object by sampling azimuth angles every 5 . Justification: Moreover, since our method targets the relatively underexplored task of 3D editing, there are no existing standard benchmark datasets for evaluation. Consequently, we construct our own evaluation datasets tailored to this task. |
| Dataset Splits | No | Our evaluation 3D dataset contains 6 objects and 15 editing prompts. To construct the evaluation image dataset, we render 72 views for each edited object by sampling azimuth angles every 5 . The paper describes the construction of an evaluation dataset but does not specify any training, validation, or test splits for model development or evaluation within the main text or appendices. |
| Hardware Specification | Yes | The entire editing process is trained on an A100 GPU for about 1.5 hours. We fine-tune the model for 800 steps, which takes 45 minutes on an A100 GPU. |
| Software Dependencies | No | We use the MV-Adapter SDXL checkpoint as our multi-view diffusion model. In our pipeline, we fine-tune the multi-view attention layers within the MV-Adapter network. For different views, we set distinct B matrices and identical A matrices, with the lora_rank set to 32 and lora_alpha set to 16. During training, the parameters of the A matrix are updated only by the gradients from the primary view. We fine-tune the model for 800 steps, which takes 45 minutes on an A100 GPU. During inference, we set the classifier-free guidance to 2. For 3D editing and refining, we first use a leave-one-out strategy to train the original 3DGS object for 10k steps, resulting in a degraded 3DGS. We then render the degraded views corresponding to the target perspectives and use them as the condition for Control Net-Tile. Using the generated multi-views as the target, we add Lo RA with a rank of 64 to all attention layers of the controlnet and fine-tune for 1800 steps. Finally, we use the fine-tuned Control Net-Tile to repair the rendered images of new perspectives and train the degraded 3DGS for an additional 10k steps. The entire 3D editing and refining process takes about 45 minutes. The paper mentions specific models like "MV-Adapter SDXL checkpoint" and "Control Net-Tile" but does not provide version numbers for these or other underlying software dependencies (e.g., Python, PyTorch). |
| Experiment Setup | Yes | The weighting coefficient α in the Primary-view Sampler is set to 0.5. For the Mo VE-Lo RA, the rank of the shared matrix A is set to 32. The number of expert matrices Bi is set to 6. The weighting coefficient λ in the two-stage inference stage is set to 0.5. We employ a leave-one-out strategy, updating the 3DGS object using the edited multi-views by iteratively leaving out one view and training on the remaining views for 10k steps. Then we employ Control Net-Tile as the base of the Full-view Refiner, injecting Lo RA into all attention layers with rank = 64, and fine-tune it for 1800 steps with a learning rate of 1e-3. Finally, we continue updating the 3DGS object for an additional 10k steps. |