Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ZeroPatcher: Training-free Sampler for Video Inpainting and Editing

Authors: Shaoshu Yang, Yingya Zhang, Ran He

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive quantitative and qualitative evaluations confirm that our method achieves outstanding video inpainting and editing performance in a plug-and-play fashion.
Researcher Affiliation Collaboration Shaoshu Yang1, 3, Yingya Zhang2, and Ran He1, 3 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2Tongyi Lab 3New Laboratory of Pattern Recognition (NLPR), CASIA
Pseudocode Yes E Algorithmic description of CD-EM We show an algorithmic description of CD-EM in Algo.1. It provides the clean CD-EM without back projection and latent mask fuser.
Open Source Code No Code and checkpoints will be released after the review process.
Open Datasets Yes We perform extensive evaluations on the DAVIS (22) and You Tube-VOS (35) datasets to assess both the inpainting and editing capabilities of our method.
Dataset Splits Yes For video inpainting evaluation, we conduct experiments on DAVIS (22) (50 videos) and You Tube VOS (35) (508 videos) using their original splits (44).
Hardware Specification Yes We use 8 NVIDIA A100 80G GPUs to run in parallel during inference to get results faster.
Software Dependencies No The paper mentions using "Cog Video X" and "Hunyuan Video" as diffusion models, and "discrete euler sampler" and "DDIM diffusion sampler," but does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The inference hyperparameters are shown in table 6. We use DDIM diffusion sampler with sampling stochasticity factor η = 1.0. Our experiment shows using η = 1.0 can greatly increase video consistency since the model will not follow the deterministic sampling trajectory. CD-EM is not always required in all denoising timesteps. We find using CD-EM in early sampling stages can already produce plausible results. Therefore, we add a stoping step for CD-EM at 25 to save computation without losing noticeable performance. With CD-EM hypermarameters set to P = 2, N = 1, K = 1, we ensure every denoising step with CD-EM will only cost 3 NFEs. With the remaining 25 steps using only 1 NFE, sampling through Zero Patcher requires 100 NFEs in total.