FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Authors: Yuren Cong, Mengmeng Xu, christian simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS", "Table 1: Quantitative results on TGVE-D and TGVE-V.", "We compare our approach with 5 publicly available text-to-video editing methods", "We conduct extensive experiments to validate the effectiveness of our method. |
| Researcher Affiliation | Collaboration | 1Leibniz University Hannover, 2Meta AI, 3The University of Hong Kong, 4Nanyang Technological University |
| Pseudocode | No | The paper describes the methodology in prose and diagrams (e.g., Figure 3, Figure 4) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The project page is available at https://flatten-video-editing.github.io/. |
| Open Datasets | Yes | We evaluate our text-to-video editing framework with 53 videos sourced from LOVEUTGVE*. 16 of these videos are from DAVIS (Perazzi et al., 2016), and we denote this subset as TGVE-D. The other 37 videos are from Videvo, which are denoted as TGVE-V. *https://sites.google.com/view/loveucvpr23/track4 |
| Dataset Splits | No | The paper uses 53 videos but does not provide explicit training, validation, and test dataset splits, only describing the total number and characteristics of the videos used for evaluation. |
| Hardware Specification | Yes | The runtime of the different models at different stages on a single A100 GPU is shown in Table 5. |
| Software Dependencies | No | The paper mentions using specific software like RAFT and xFormers but does not provide their version numbers. |
| Experiment Setup | Yes | We implement 100 timesteps for DDIM inversion and 50 timesteps for DDIM sampling. |