MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

Authors: Chenjie Cao, Chaohui Yu, Fan Wang, Xiangyang Xue, Yanwei Fu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Sufficient scene-level experiments on both object-centric and forward-facing datasets verify the effectiveness of MVInpainter, including diverse tasks, such as multiview object removal, synthesis, insertion, and replacement.
Researcher Affiliation Collaboration Chenjie Cao1,2,3, Chaohui Yu2,3, Fan Wang2,3, Xiangyang Xue1, Yanwei Fu1 1Fudan University, 2DAMO Academy, Alibaba Group, 3Hupan Lab
Pseudocode No The paper includes figures illustrating the pipeline and components (e.g., Figure 2, Figure 3), but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes our codes will also be open-released.
Open Datasets Yes MVInpainter-O is trained on the object-centric data that includes full categories of CO3D [57] and MVImg Net [95]. Moreover, we regard the Omni3D [6] as the zero-shot validation. MVInpainter-F is trained on the forward-facing data with Real10K [103], Scannet++ [89], and DL3DV [41], including both indoor and outdoor scenes. We further employ comparison on SPIn Ne RF [51] to verify the object removal ability.
Dataset Splits No The paper mentions using 'zero-shot validation' for Omni3D and 'mixed scene-level validation' for Real10K, Scannet++, and DL3DV, and refers to test sets for these datasets (e.g., '10 scenes are selected from SPIn Ne RF [51] test set'), but it does not specify explicit numerical percentages or counts for training, validation, and test splits across all datasets used for training.
Hardware Specification Yes All trainings are accomplished on 8 A800 GPUs.
Software Dependencies No The paper mentions various software components and models used (e.g., 'SD1.5-inpainting', 'Animate Diff', 'RAFT', 'SAM-tracking'), but it does not provide specific version numbers for any of them.
Experiment Setup Yes We train MVInpainter-O and MVInpainter-F for 100k and 60k steps with batch size 64, frame number 12, learning rate 1e-4 for 3 days and 2 days respectively. Then we fine-tune the model with dynamic frames for 10k steps. All images are resized and cropped into 256 256 for both inpainting and flow extraction.