reproducibilityindex.ai

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Authors: Jihwan Kim, Junoh Kang, Jinyoung Choi, Bohyung Han

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines. and This section presents the examples generated by existing long video generation methods including FIFO-Diffusion, and evaluates their performance qualitatively and quantitatively.
Researcher Affiliation	Academia	Jihwan Kim 1 Junoh Kang 1 Jinyoung Choi1 Bohyung Han1,2 Computer Vision Laboratory, 1ECE & 2IPAI, Seoul National University {kjh26720,junoh.kang, jin0.choi, bhhan}@snu.ac.kr
Pseudocode	Yes	Algorithm 1 FIFO-Diffusion with diagonal denoising (Section 3.1) ... Algorithm 4 FIFO-Diffusion with lookahead denoising (Section 3.3)
Open Source Code	Yes	Generated video examples and source codes are available at our project page1. 1https://jjihwan.github.io/projects/FIFO-Diffusion.
Open Datasets	Yes	For quantitative evaluation, we measure FVD128 [27] and IS [21] scores using Latte [13] as a base model, which is a Di T-based video model trained on UCF-101 [26].
Dataset Splits	No	The paper uses pretrained models and describes the generation of videos for evaluation (e.g., 'generate 2,048 videos with 128 frames each') but does not specify explicit train/validation/test dataset splits used in their experiments.
Hardware Specification	Yes	We adopt Video Crafter2 as the baseline model, using a DDPM scheduler with 64 inference steps on A6000 GPUs.
Software Dependencies	No	The paper mentions software like Video Crafter1, Video Crafter2, zeroscope, Open-Sora Plan, La Vie, and SEINE, and uses DDIM sampling, but does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	We employ the DDIM sampling [24] with η {0.5, 1}. and We empirically choose n = 4 for the number of partitions in latent partitioning and lookahead denoising. and Table 4 which lists specific parameters like f, n, η, # Prompts, # Frames, Resolution for various experiments.