Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
Authors: Hyeonho Jeong, Jong Chul Ye
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and applications demonstrate that Ground-A-Video s zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency. |
| Researcher Affiliation | Academia | Hyeonho Jeong & Jong Chul Ye Kim Jaechul Graduate School of AI, KAIST EMAIL |
| Pseudocode | Yes | Algorithm 1 Optical Flow guided Inverted Latents Smoothing |
| Open Source Code | Yes | Further results and code are available at http://ground-a-video.github.io. |
| Open Datasets | Yes | We use a subset of 20 videos from DAVIS dataset (Pont-Tuset et al., 2017). |
| Dataset Splits | No | The paper mentions using a subset of the DAVIS dataset but does not provide specific train/validation/test split percentages or sample counts for reproducibility. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for running its experiments. |
| Software Dependencies | No | The paper mentions several software components and models (e.g., Stable Diffusion v1.4, Control Net Depth, GLIGEN, RAFT-Large, Zoe Depth, BLIP-2, GLIP, DDIM scheduler) but does not provide specific version numbers for the underlying software stack (e.g., Python, PyTorch/TensorFlow, CUDA). |
| Experiment Setup | Yes | Generated videos are configured to consist of 8 frames, unless explicitly specified, with a uniform resolution of 512x512. ... In the flow-driven inverted latents smoothing stage, the magnitude threshold Mthres is set to 0.2. At inference, DDIM scheduler (Song et al., 2020a) with 50 steps and classifier-free guidance (Ho & Salimans, 2022) of 12.5 scale is used. |