Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Authors: Pengyang Ling, Jiazi Bu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Motion Clone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency. The paper includes sections such as '4 EXPERIMENTS', '4.3 QUALITATIVE COMPARISON', '4.4 QUANTITATIVE COMPARISON', and '4.6 ABLATION AND ANALYSIS' which detail empirical studies and data analysis.
Researcher Affiliation Academia 1University of Science and Technology of China 2Shanghai Jiao Tong University 3The Chinese University of Hong Kong 4Shanghai AI Laboratory
Pseudocode No The paper describes methods using mathematical equations and prose but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes https://github.com/LPeng Yang/Motion Clone
Open Datasets Yes For experimental evaluation, 40 real videos sourced from DAVIS (Pont-Tuset et al., 2017) and website are utilized for a thorough analysis, comprising 15 videos with camera motion and 25 videos for object motion.
Dataset Splits No The paper mentions using 40 real videos for experimental evaluation but does not provide specific details on training, validation, or test dataset splits. It describes the total number of videos and their categories but not how they were partitioned for experiments.
Hardware Specification No The paper does not provide specific hardware details such as CPU, GPU models, or memory used for running the experiments.
Software Dependencies No The paper mentions using 'Animate Diff(Guo et al., 2023b) as the base text-to-video generation model and leverage Sparse Ctrl (Guo et al., 2023a) for image-to-video and sketch-to-video generator.' However, it does not provide specific version numbers for these or any other software dependencies like programming languages, libraries, or frameworks.
Experiment Setup Yes For given real videos, we apply single denoising in tα = 400 for motion representation extraction. k = 1 is adopted for mask in Eq. 5 to facilitate sparse constraint. null-text is uniformly used as textual prompt for preparing motion representations, promoting a more convenient video customization. The motion guidance is conducted on temporal attention layers in up block.1 . The detailed ablations of above setting are represented in 4.6. Guidance weight s and λ in Eq. 2 are empirically set as 7.5, and 2000, respectively. For camera motion cloning, the denoising step is configured to 100, in which the motion guidance steps set as 50. For object motion cloning, the denoising step is raised to 300, while applying motion guidance in the early 180 steps.