Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Authors: Teng Hu, Zhentao Yu, Zhengguang Zhou, Jiangning Zhang, Yuan Zhou, Qinglin Lu, Ran Yi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Poly Vivid achieves superior performance in identity fidelity, video realism, and subject alignment, outperforming existing open-source and commercial baselines. We compared Poly Vivid with the state-of-the arts video customization methods, including commercial products (Vidu-2.0 [45], Kling-1.6 [25], Pika [37], and Hailuo [13]) and open-sourced methods (Skyreels-A2 [11] and VACE-1.3B [24]). For each model, we generate 100 videos, which are employed to compute the quantitative metrics.
Researcher Affiliation Collaboration Teng Hu1 Zhentao Yu2 Zhengguang Zhou2 Jiangning Zhang3 Yuan Zhou2 Qinglin Lu2 Ran Yi1 1Shanghai Jiao Tong University 2Tencent Hunyuan 3Zhejiang University
Pseudocode No The paper describes methods and processes like 'Clique-based Subject Consolidation' and 'Attention-inherited Identity Injection' using prose and mathematical formulations (equations 2-10), but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: This project is a relatively large project that contains a lot of real name information. We will open-source the code after the paper is published.
Open Datasets Yes We curate a large set of high-quality data from open-source datasets, including Panda-70M [6] and Koala-36M [47], as well as our own collected data.
Dataset Splits No The paper mentions two training stages, each involving 5,000 iterations, for single-subject and multi-subject data. It also describes the creation of a 'Test Dataset' consisting of 100 image pairs. However, it does not provide specific train/validation/test splits (e.g., percentages or counts) for the overall datasets used to train the Poly Vivid model (e.g., Panda-70M, Koala-36M).
Hardware Specification Yes All training processes are conducted on 256 GPUs, each with more than 80GB of memory, using a batch size of 256.
Software Dependencies No The paper mentions various models and tools used, such as LLa VA [31], Hunyuan Video [27], Florence2 [51], YOLOv11 [26], and DINO-v2 [35], but it does not specify explicit version numbers for programming languages, libraries, or frameworks used for the implementation (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes To enhance the efficiency of the training process, we divide it into two distinct stages. The first stage focuses on modeling the identity preservation capability... This stage involves 5,000 iterations. Once the model has effectively learned identity preservation, we proceed to the second stage... This stage also comprises 5,000 iterations. Additionally, ... in each stage, we initially train the model at reduced sizes for 1,000 iterations (included in the total 5,000 iterations)... All training processes are conducted on 256 GPUs, each with more than 80GB of memory, using a batch size of 256.