Towards Smooth Video Composition
Authors: Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on a range of datasets and show substantial improvements over baselines on video generation. |
| Researcher Affiliation | Collaboration | Qihang Zhang1 Ceyuan Yang2 Yujun Shen3 Yinghao Xu1 Bolei Zhou4 1The Chinese University of Hong Kong, 2Shanghai AI Laboratory, 3Ant Group, 4University of California, Los Angeles |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models are publicly available at https://genforce.github. io/Style SV. |
| Open Datasets | Yes | We evaluate our approach on a range of datasets and show substantial improvements over baselines on video generation. Code and models are publicly available at https://genforce.github. io/Style SV. |
| Dataset Splits | No | The paper mentions evaluating results with the highest FVD16 score after training, which implies a validation step, but it does not explicitly provide specific dataset split information (percentages, counts, or predefined splits) for training, validation, and testing needed to reproduce the data partitioning. |
| Hardware Specification | Yes | We follow the training receipt of Style GAN-V and train models on a server with 8 A100 GPUs. |
| Software Dependencies | No | The paper states, 'Our method is developed based on the official implementation of Style GAN-V (Skorokhodov et al., 2022),' but it does not provide specific version numbers for software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | In terms of various methods and datasets, we grid search the R1 regularization weight, whose details are available in Appendix. Empirically, we find that a smaller R1 value (e.g., 0.25) works well for pretraining stage (Config-C). While a larger R1 value (e.g., 4) better suits to video generation learning. |