Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

Authors: Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan LI, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate these general strategies on various I2V-DMs on our collected open-domain image benchmark and the UCF101 dataset. Extensive results show that our methods outperform baselines by producing higher motion scores with lower errors while maintaining image alignment and temporal consistency, thereby yielding superior overall performance and enabling more accurate motion control.
Researcher Affiliation Collaboration Min Zhao1,3 , Hongzhou Zhu1,3 , Chendong Xiang1,3, Kaiwen Zheng1,3, Chongxuan Li2 , Jun Zhu1,3,4 1Dept. of Comp. Sci. & Tech., BNRist Center, THU-Bosch ML Center, Tsinghua University 2 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China 3Sheng Shu, Beijing, China; 4Pazhou Laboratory (Huangpu), Guangzhou, China
Pseudocode Yes Algorithm 1 Sampling from an I2V diffusion model with Analytic-Init
Open Source Code Yes The project page: https://cond-image-leak.github.io/. All used codes in this paper and their licenses are listed in Tab. 3.
Open Datasets Yes We use Web Vid-2M [2] as the training dataset... For evaluation, we use UCF101 [49] and our Image Bench dataset
Dataset Splits No The paper mentions Web Vid-2M as the training dataset and UCF101 and Image Bench for evaluation, along with sample counts for FVD and IS on UCF101. However, it does not explicitly provide specific train/validation/test dataset splits with percentages or counts for its experiments.
Hardware Specification Yes Our experiments were conducted using A800-80G GPUs, and the computational costs are detailed in Tab. 6. Table 6: Compute resources. Model Iterations GPU-type GPU-nums Hours Dynami Crafter [63] 20,000 A800 8 8 Video Crafter1 [12] 20,000 A800 8 8 SVD [9] 20,000 A800 6 7
Software Dependencies No The paper lists various existing model implementations and their licenses in Table 3, but it does not specify programming language versions (e.g., Python 3.x) or specific library versions (e.g., PyTorch 1.x) required to replicate the experiments.
Experiment Setup Yes Table 4: Training settings for Dynami Crafter [63] and Video Crafter1 [12]. Config Value Optimizer Adam W Learning rate 1e-5 Weight decay 1e-2 Optimizer momentum β1, β2=0.9, 0.999 Batch size 64 Training iterations 20,000