FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
Authors: Yu Lu, Yuanzhi Liang, Linchao Zhu, Yi Yang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated Free Long on multiple base video diffusion models and observed significant improvements. |
| Researcher Affiliation | Academia | Yu Lu1, Yuanzhi Liang2, Linchao Zhu1, Yi Yang1 1The State Key Lab of Brain-Machine Intelligence, Zhejiang University 2University of Technology Sydney |
| Pseudocode | No | The paper describes the algorithm steps in text and equations but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | We have released our code. |
| Open Datasets | Yes | We chose 200 prompts from VBench [41] to validate the effectiveness of our method. |
| Dataset Splits | No | The paper does not explicitly provide validation dataset splits for its own experiments. It focuses on adapting pre-trained models and evaluates on a test set. |
| Hardware Specification | Yes | Moreover, we also examine the inference time of these methods on the NVIDIA A100. |
| Software Dependencies | No | Table 2 lists "La Vie", "Diffusers", "Video Crafter", and "Modelscope" as used codes, but no specific version numbers are provided for these software components. |
| Experiment Setup | Yes | We set α = 8 for the local attention setting and set τ to 25. During inference, the parameters of the frequency filter for each model are kept the same for a fair comparison. Specifically, we use a Gaussian Low Pass Filter (GLPF) PG with a normalized spatiotemporal stop frequency of D0 = 0.25. |