FRAG: Frequency Adapting Group for Diffusion Video Editing

Authors: Sunjae Yoon, Gwanhyeong Koo, Geonwoo Kim, Chang D. Yoo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental FRAG is performed in a model-agnostic manner without additional training and validates the effectiveness on video editing benchmarks (i.e., TGVE, DAVIS). Additionally, Table 1 presents evaluations of edited videos on DAVIS and TGVE of recent editing systems with FRAG in five criteria...
Researcher Affiliation Academia Sunjae Yoon 1 Gwanhyeong Koo 1 Geonwoo Kim 1 Chang D. Yoo 1 1Korea Advanced Institute of Science and Technology (KAIST), South Korea. Correspondence to: Chang D. Yoo <cd yoo@kaist.ac.kr>.
Pseudocode Yes Algorithm 1 Agglomerative Hierarchical Clustering is provided in Appendix C.
Open Source Code No The paper states that FRAG is validated on recent editing systems using "their public codes and papers", but it does not provide an explicit statement or link for the open-source code of FRAG itself.
Open Datasets Yes We validated videos using the TGVE and DAVIS datasets, both of which are video editing challenge datasets9 containing 32 to 128 frames each. 9https://sites.google.com/view/loveucvpr23/track4
Dataset Splits No The paper mentions using TGVE and DAVIS datasets for validation and evaluation but does not specify details on how the data was split into training, validation, and test sets for their experiments.
Hardware Specification Yes The experimental settings are W = H = 64, L = 48, C = 4, σ = 0.25, d0 = 6 on NVIDIA A100 GPU.
Software Dependencies No The paper mentions using CLIP (Radford et al., 2021) for the text encoder, VQ-VAE (Van Den Oord et al., 2017) for image frame encoding, and a pretrained Stable Diffusion v2.1 (Rombach et al., 2022). While these are specific models/frameworks, it does not provide version numbers for general software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes The experimental settings are W = H = 64, L = 48, C = 4, σ = 0.25, d0 = 6 on NVIDIA A100 GPU. We set the margin d0 = 6 for the Sptial Moment Adaption, which was the most effective in our framework... We adjusted the minimum size of the temporal group along the frame axis to a range between 1 and 4, according to video quality enhancement modules.