FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Authors: Yu Lu, Yuanzhi Liang, Linchao Zhu, Yi Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated Free Long on multiple base video diffusion models and observed significant improvements.
Researcher Affiliation Academia Yu Lu1, Yuanzhi Liang2, Linchao Zhu1, Yi Yang1 1The State Key Lab of Brain-Machine Intelligence, Zhejiang University 2University of Technology Sydney
Pseudocode No The paper describes the algorithm steps in text and equations but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes We have released our code.
Open Datasets Yes We chose 200 prompts from VBench [41] to validate the effectiveness of our method.
Dataset Splits No The paper does not explicitly provide validation dataset splits for its own experiments. It focuses on adapting pre-trained models and evaluates on a test set.
Hardware Specification Yes Moreover, we also examine the inference time of these methods on the NVIDIA A100.
Software Dependencies No Table 2 lists "La Vie", "Diffusers", "Video Crafter", and "Modelscope" as used codes, but no specific version numbers are provided for these software components.
Experiment Setup Yes We set α = 8 for the local attention setting and set τ to 25. During inference, the parameters of the frequency filter for each model are kept the same for a fair comparison. Specifically, we use a Gaussian Low Pass Filter (GLPF) PG with a normalized spatiotemporal stop frequency of D0 = 0.25.