Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Authors: Jihwan Kim, Junoh Kang, Jinyoung Choi, Bohyung Han
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines. and This section presents the examples generated by existing long video generation methods including FIFO-Diffusion, and evaluates their performance qualitatively and quantitatively. |
| Researcher Affiliation | Academia | Jihwan Kim 1 Junoh Kang 1 Jinyoung Choi1 Bohyung Han1,2 Computer Vision Laboratory, 1ECE & 2IPAI, Seoul National University EMAIL |
| Pseudocode | Yes | Algorithm 1 FIFO-Diffusion with diagonal denoising (Section 3.1) ... Algorithm 4 FIFO-Diffusion with lookahead denoising (Section 3.3) |
| Open Source Code | Yes | Generated video examples and source codes are available at our project page1. 1https://jjihwan.github.io/projects/FIFO-Diffusion. |
| Open Datasets | Yes | For quantitative evaluation, we measure FVD128 [27] and IS [21] scores using Latte [13] as a base model, which is a Di T-based video model trained on UCF-101 [26]. |
| Dataset Splits | No | The paper uses pretrained models and describes the generation of videos for evaluation (e.g., 'generate 2,048 videos with 128 frames each') but does not specify explicit train/validation/test dataset splits used in their experiments. |
| Hardware Specification | Yes | We adopt Video Crafter2 as the baseline model, using a DDPM scheduler with 64 inference steps on A6000 GPUs. |
| Software Dependencies | No | The paper mentions software like Video Crafter1, Video Crafter2, zeroscope, Open-Sora Plan, La Vie, and SEINE, and uses DDIM sampling, but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | We employ the DDIM sampling [24] with η {0.5, 1}. and We empirically choose n = 4 for the number of partitions in latent partitioning and lookahead denoising. and Table 4 which lists specific parameters like f, n, η, # Prompts, # Frames, Resolution for various experiments. |