Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

UniTransfer: Video Concept Transfer via Progressive Spatio-Temporal Decomposition

Authors: guojun lei, Rong Zhang, Tianhang Liu, Hong Li, Zhiyuan Ma, Chi Wang, Weiwei Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves high-quality and controllable video concept transfer across diverse reference images and scenes, surpassing existing baselines in both visual fidelity and editability.
Researcher Affiliation Academia Guojun Lei1 , Rong Zhang2 , Tianhang Liu1, Hong Li4, Zhiyuan Ma3 , Chi Wang5, Weiwei Xu1 1 State Key Lab of CAD&CG, Zhejiang University, 2 Zhejiang Gongshang University, 3 Huazhong University of Science and Technology, 4 Beihang University, 5 Ningbo Global Innovation Center, Zhejiang University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: The pipeline of our self-supervised pretraining. Algorithm 2: Chain-of-Prompt-Guided Video Denoising
Open Source Code No We are happy to release our code for the reproduction. Due to time constraints, we plan to organize and disclose the code upon completion of the paper submission process.
Open Datasets Yes We also curate an animal-centric video dataset called Open Animal to facilitate the advancement and benchmarking of research in video concept transfer. In the pre-training phase, we trained our model on the Open Vid dataset[30]
Dataset Splits No For the human-centric editing task, we fine-tune the model on a combination of the Tik Tok [17] and UBC Fashion [50] datasets... For the animal-centric task, we performed an additional 10,000 fine-tuning iterations on Open Animal using the same learning rate. This indicates datasets used for training/fine-tuning, but does not specify how these datasets were split into training/validation/test sets for evaluation in this paper.
Hardware Specification Yes The experiments are conducted on a server equipped with 8 NVIDIA Tesla H100 80G GPUs.
Software Dependencies No The weights of the model are initialized from a pretrained Cog Video X-5B [48]. We employ an LLM Qwen-QWQ-32B [38] to summarize ฯ„fine to ฯ„crs, ฯ„mid. While specific models are named, specific versions of Python, PyTorch, CUDA, or other libraries used for the implementation are not provided.
Experiment Setup Yes The training used a learning rate of 1 ร— 10โˆ’4 and ran for 200,000 iterations. The model is finetuned for 10,000 iterations with a learning rate of 5 ร— 10โˆ’5.