Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Authors: Pengyang Ling, Jiazi Bu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Motion Clone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency. The paper includes sections such as '4 EXPERIMENTS', '4.3 QUALITATIVE COMPARISON', '4.4 QUANTITATIVE COMPARISON', and '4.6 ABLATION AND ANALYSIS' which detail empirical studies and data analysis. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China 2Shanghai Jiao Tong University 3The Chinese University of Hong Kong 4Shanghai AI Laboratory |
| Pseudocode | No | The paper describes methods using mathematical equations and prose but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | https://github.com/LPeng Yang/Motion Clone |
| Open Datasets | Yes | For experimental evaluation, 40 real videos sourced from DAVIS (Pont-Tuset et al., 2017) and website are utilized for a thorough analysis, comprising 15 videos with camera motion and 25 videos for object motion. |
| Dataset Splits | No | The paper mentions using 40 real videos for experimental evaluation but does not provide specific details on training, validation, or test dataset splits. It describes the total number of videos and their categories but not how they were partitioned for experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU, GPU models, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Animate Diff(Guo et al., 2023b) as the base text-to-video generation model and leverage Sparse Ctrl (Guo et al., 2023a) for image-to-video and sketch-to-video generator.' However, it does not provide specific version numbers for these or any other software dependencies like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | For given real videos, we apply single denoising in tα = 400 for motion representation extraction. k = 1 is adopted for mask in Eq. 5 to facilitate sparse constraint. null-text is uniformly used as textual prompt for preparing motion representations, promoting a more convenient video customization. The motion guidance is conducted on temporal attention layers in up block.1 . The detailed ablations of above setting are represented in 4.6. Guidance weight s and λ in Eq. 2 are empirically set as 7.5, and 2000, respectively. For camera motion cloning, the denoising step is configured to 100, in which the motion guidance steps set as 50. For object motion cloning, the denoising step is raised to 300, while applying motion guidance in the early 180 steps. |