reproducibilityindex.ai

Towards Consistent Video Editing with Text-to-Image Diffusion Models

Authors: Zicheng Zhang, Bonan Li, Xuecheng Nie, Congying Han, Tiande Guo, Luoqi Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superiority of the proposed EI2 model.
Researcher Affiliation	Collaboration	1University of Chinese Academy of Sciences 2MT Lab, Meitu Inc.
Pseudocode	No	The paper does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'Our implementation of EI2 is based on the stable diffusion v1-4 framework3. Stable Diffusion: https://huggingface.co/CompVis/stable-diffusion-v1-4'. This is a reference to a third-party framework used as a base, not the authors' own source code for EI2.
Open Datasets	Yes	Following previous works [56, 28], we collect videos from the DAVIS dataset [34] for comparison. We also gather face videos from the Pexels website to assess the fine-grained editing in the face domain. We utilize a captioning model [27] to automatically generate the text prompts.
Dataset Splits	No	The paper mentions 'perform tuning on 8-frame videos of size 512 × 512' and 'tune WQ of the FFAM and CA Modules, and all parameters of STAMs' but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	Yes	All experiments are conducted on an NVIDIA Tesla V100 GPU.
Software Dependencies	Yes	Our implementation of EI2 is based on the stable diffusion v1-4 framework3. We utilize the Adam W optimizer.
Experiment Setup	Yes	We utilize the Adam W optimizer with a learning rate of 3e 5 for a total of 500 steps. During inference, we initialize the model from the DDIM inversion[15] and set the default classifier-free guidance [17] to 7.5.