reproducibilityindex.ai

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Authors: Yuren Cong, Mengmeng Xu, christian simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS", "Table 1: Quantitative results on TGVE-D and TGVE-V.", "We compare our approach with 5 publicly available text-to-video editing methods", "We conduct extensive experiments to validate the effectiveness of our method.
Researcher Affiliation	Collaboration	1Leibniz University Hannover, 2Meta AI, 3The University of Hong Kong, 4Nanyang Technological University
Pseudocode	No	The paper describes the methodology in prose and diagrams (e.g., Figure 3, Figure 4) but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	The project page is available at https://flatten-video-editing.github.io/.
Open Datasets	Yes	We evaluate our text-to-video editing framework with 53 videos sourced from LOVEUTGVE. 16 of these videos are from DAVIS (Perazzi et al., 2016), and we denote this subset as TGVE-D. The other 37 videos are from Videvo, which are denoted as TGVE-V. https://sites.google.com/view/loveucvpr23/track4
Dataset Splits	No	The paper uses 53 videos but does not provide explicit training, validation, and test dataset splits, only describing the total number and characteristics of the videos used for evaluation.
Hardware Specification	Yes	The runtime of the different models at different stages on a single A100 GPU is shown in Table 5.
Software Dependencies	No	The paper mentions using specific software like RAFT and xFormers but does not provide their version numbers.
Experiment Setup	Yes	We implement 100 timesteps for DDIM inversion and 50 timesteps for DDIM sampling.