GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
Authors: Mingzhen Sun, Weining Wang, Zihan Qin, Jiahui Sun, Sihan Chen, Jing Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate the effectiveness and efficiency of our proposed method1, and new state-of-the-art results have been achieved on multiple benchmarks. |
| Researcher Affiliation | Academia | 1Institute of Automation, Chinese Academy of Sciences (CASIA) 2School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) |
| Pseudocode | No | The paper describes its methods in prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes have been released in https://github.com/iva-mzsun/GLOBER |
| Open Datasets | Yes | Table 1 and Table 2 reports the results of our model trained on the Sky Time-lapse [36], Tai Chi-HD [37], UCF-101 [38], and Webvid-10M [39] datasets for 16-frame video generation in both unconditional and conditional settings. |
| Dataset Splits | No | The paper mentions training on specific datasets (UCF101, Tai Chi HD, and Sky Time-lapse) but does not provide specific details on how these datasets were split into training, validation, or test sets for their experiments. |
| Hardware Specification | Yes | All experiments are implemented using Py Torch [40] and conducted on 8 NVIDIA A100 GPUs, with 16-precision adopted for fast training." and "Results with * are taken from PVDM and measured with a single NVIDIA 3090ti 24GB GPU. The rest are evaluated on a single NVIDIA 3090 24GB GPU by us due to lack of 3090ti. |
| Software Dependencies | No | The paper states 'All experiments are implemented using Py Torch [40]' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The video auto-encoder was trained with a batch size of 40 per GPU for 80K, 40K, and 40K steps on the UCF101, Tai Chi HD, and Sky Time-lapse datasets, respectively. The loss weight λ1 and λ2 are set as 1e-6 and 0.1, respectively. |