Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Training-free Camera Control for Video Generation

Authors: Chen Hou, Zhibo Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have demonstrated its superior performance in both video generation and camera motion alignment compared with other finetuned methods. Furthermore, we show the capability of Cam Trol to generalize to various base models, as well as its impressive applications in scalable motion control, dealing with complicated trajectories and unsupervised 3D video generation. Videos available at https://lifedecoder.github.io/Cam Trol/.
Researcher Affiliation	Academia	Chen Hou, Zhibo Chen University of Science and Technology of China {houchen@mail.,chenzhibo@}ustc.edu.cn
Pseudocode	Yes	Algorithm 1: Training-free camera control for video generation
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for their method, nor does it provide a direct link to a code repository. The provided URL (https://lifedecoder.github.io/Cam Trol/) is for a demo page showcasing videos.
Open Datasets	Yes	Specifically, we randomly sample 500 prompt-trajectory pairs from Real Estate10k (Zhou et al., 2018), and use them as references for calculating FVD and FID.
Dataset Splits	Yes	Specifically, we randomly sample 500 prompt-trajectory pairs from Real Estate10k (Zhou et al., 2018), and use them as references for calculating FVD and FID.
Hardware Specification	Yes	This saves 10-20GB of GPU memory compared to other methods under the same circumstances, allowing it to run on a single RTX 3090.
Software Dependencies	Yes	For text prompt input, we use Stable Diffusion v2-1 2 or Stable Diffusion XL 3 to generate the initial image. The inpainting model we apply is Stable Diffusion inpainting model proposed by Runway 4, and the backward step of inpainting is set to 25. We use Zeo Depth 5 as depth estimation model.
Experiment Setup	Yes	For all methods, the number of frames and the decoding size of SVD are set to 14. We use 25 steps for both the inversion and generation processes. We set σ = 1 to encourage diversity in generation process. The backward step of inpainting is set to 25. In our experiment, we choose (j, k) ∈ [−10, 10] as the size of the patch.