Towards Consistent Video Editing with Text-to-Image Diffusion Models
Authors: Zicheng Zhang, Bonan Li, Xuecheng Nie, Congying Han, Tiande Guo, Luoqi Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of the proposed EI2 model. |
| Researcher Affiliation | Collaboration | 1University of Chinese Academy of Sciences 2MT Lab, Meitu Inc. |
| Pseudocode | No | The paper does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'Our implementation of EI2 is based on the stable diffusion v1-4 framework3. Stable Diffusion: https://huggingface.co/CompVis/stable-diffusion-v1-4'. This is a reference to a third-party framework used as a base, not the authors' own source code for EI2. |
| Open Datasets | Yes | Following previous works [56, 28], we collect videos from the DAVIS dataset [34] for comparison. We also gather face videos from the Pexels website to assess the fine-grained editing in the face domain. We utilize a captioning model [27] to automatically generate the text prompts. |
| Dataset Splits | No | The paper mentions 'perform tuning on 8-frame videos of size 512 × 512' and 'tune WQ of the FFAM and CA Modules, and all parameters of STAMs' but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA Tesla V100 GPU. |
| Software Dependencies | Yes | Our implementation of EI2 is based on the stable diffusion v1-4 framework3. We utilize the Adam W optimizer. |
| Experiment Setup | Yes | We utilize the Adam W optimizer with a learning rate of 3e 5 for a total of 500 steps. During inference, we initialize the model from the DDIM inversion[15] and set the default classifier-free guidance [17] to 7.5. |