Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Training-free Camera Control for Video Generation
Authors: Chen Hou, Zhibo Chen
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have demonstrated its superior performance in both video generation and camera motion alignment compared with other finetuned methods. Furthermore, we show the capability of Cam Trol to generalize to various base models, as well as its impressive applications in scalable motion control, dealing with complicated trajectories and unsupervised 3D video generation. Videos available at https://lifedecoder.github.io/Cam Trol/. |
| Researcher Affiliation | Academia | Chen Hou, Zhibo Chen University of Science and Technology of China {houchen@mail.,chenzhibo@}ustc.edu.cn |
| Pseudocode | Yes | Algorithm 1: Training-free camera control for video generation |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for their method, nor does it provide a direct link to a code repository. The provided URL (https://lifedecoder.github.io/Cam Trol/) is for a demo page showcasing videos. |
| Open Datasets | Yes | Specifically, we randomly sample 500 prompt-trajectory pairs from Real Estate10k (Zhou et al., 2018), and use them as references for calculating FVD and FID. |
| Dataset Splits | Yes | Specifically, we randomly sample 500 prompt-trajectory pairs from Real Estate10k (Zhou et al., 2018), and use them as references for calculating FVD and FID. |
| Hardware Specification | Yes | This saves 10-20GB of GPU memory compared to other methods under the same circumstances, allowing it to run on a single RTX 3090. |
| Software Dependencies | Yes | For text prompt input, we use Stable Diffusion v2-1 2 or Stable Diffusion XL 3 to generate the initial image. The inpainting model we apply is Stable Diffusion inpainting model proposed by Runway 4, and the backward step of inpainting is set to 25. We use Zeo Depth 5 as depth estimation model. |
| Experiment Setup | Yes | For all methods, the number of frames and the decoding size of SVD are set to 14. We use 25 steps for both the inversion and generation processes. We set σ = 1 to encourage diversity in generation process. The backward step of inpainting is set to 25. In our experiment, we choose (j, k) ∈ [−10, 10] as the size of the patch. |