Collaborative Score Distillation for Consistent Visual Editing
Authors: Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the effectiveness of CSD in a variety of editing tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.1 |
| Researcher Affiliation | Collaboration | Subin Kim ,1 Kyungmin Lee ,1 June Suk Choi1 Jongheon Jeong1 Kihyuk Sohn2 Jinwoo Shin1 1KAIST 2Google Research |
| Pseudocode | No | The paper describes its methods using mathematical equations and descriptive text, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | 1Visualizations are available at the website https://subin-kim-cv.github.io/CSD. The website linked in the paper states 'Our code will be publicly available at github.com/subin-kim-cv/CSD', which indicates a future release, not current availability. |
| Open Datasets | Yes | For the video editing experiments, we use video sequences from the popular DAVIS [33] dataset at a resolution of 1920 × 1080. |
| Dataset Splits | No | Following Instruct-NeRF2NeRF [39], we first pretrain NeRF using the nerfacto model from NeRFStudio [57], training it for 30,000 steps. Next, we re-initialize the optimizer and finetune the pre-trained NeRF model with edited train views. In contrast to Instruct-NeRF2NeRF, which edits one train view with Instruct-Pix2Pix after every 10 steps of update, we edit a batch of train views (batch size of 16) with CSD-Edit after every 2000 steps of update. The batch is randomly selected among the train views without replacement. The paper mentions 'train views' but does not explicitly provide details about a validation set or how the dataset used for finetuning NeRFs is formally split into train/validation/test sets for reproducibility. |
| Hardware Specification | Yes | All experiments are conducted on AMD EPYC 7V13 64-Core Processor and a single NVIDIA A100 80GB. |
| Software Dependencies | No | For the experiments with CSD-Edit, we use the publicly available pre-trained model of Instruct-Pix2Pix [14]2 by default. We perform CSD-Edit optimization on the output space of Stable Diffusion [4] autoencoder. Throughout the experiments, we use Open CLIP [56] ViT-big G-14 model for evaluation. Following Instruct-NeRF2NeRF [39], we first pretrain NeRF using the nerfacto model from NeRFStudio [57]... We use Adan [62] optimizer... The paper mentions various software components and models but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We set tmin = 0.2 and tmax = 0.5, where original SDS optimization for Dream Fusion used tmin = 0.2 and tmax = 0.98. This is because we do not generally require a large scale of noise in editing. We use the guidance scale ωy ∈ [3.0, 15.0] and image guidance scale ωs ∈ [1.5, 5.0]. We use learning rate [0.25, 2] and optimize them for [200, 500] iterations. We use Adan [62] optimizer with learning rate warmup over 2000 steps from 10−9 to 2 × 10−3 followed by cosine decay down to 10−6. We use batch size of 4 and optimize for 10000 steps in total... |