Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation
Authors: Sunghyun Park, Kangyeol Kim, Junsoo Lee, Jaegul Choo, Joonseok Lee, Sookyung Kim, Edward Choi2412-2422
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With extensive experiments on four real-world video datasets, we verify that the proposed Vid-ODE outperforms state-of-the-art approaches under various video generation settings, both within the trained time range (interpolation) and beyond the range (extrapolation). |
| Researcher Affiliation | Collaboration | 1KAIST 2Google Research 3Lawrence Livermore Nat l Lab. |
| Pseudocode | No | The paper describes the proposed method using mathematical formulations and textual descriptions, but it does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | KTH Action (Schuldt, Laptev, and Caputo 2004) consists of 399 videos... Moving GIF (Siarohin et al. 2019) consists of 1,000 videos... Penn Action (Zhang, Zhu, and Derpanis 2013) consists of videos of humans playing sports... CAM5 (Kim et al. 2019) is a hurricane video dataset... Bouncing Ball contains three balls moving in different directions... |
| Dataset Splits | No | For our evaluation, we employ and preprocess the four real-world datasets and the one synthetic dataset as follows: KTH Action (Schuldt, Laptev, and Caputo 2004) consists of 399 videos of 25 subjects performing six different types of actions (walking, jogging, running, boxing, hand waving, and hand clapping). We use 255 videos of 16 (out of 25) subjects for training and the rest for testing. Moving GIF (Siarohin et al. 2019) consists of 1,000 videos of animated animal characters... We use 900 for training and 100 for testing. Penn Action (Zhang, Zhu, and Derpanis 2013) consists of videos of humans playing sports... We use 1,258 videos for training and 1,068 for testing. CAM5 (Kim et al. 2019) is a hurricane video dataset... We use 280 out of these for training and 39 for testing. Bouncing Ball... We use 1,000 videos for training and 50 videos for testing. The paper specifies training and testing splits for all datasets but does not explicitly quantify a separate validation split. |
| Hardware Specification | Yes | As for training ODEs, Vid-ODE required only 7 hours for training on the KTH Action dataset using a single NVIDIA Titan RTX (using 6.5GB VRAM). |
| Software Dependencies | No | The paper mentions using Adamax as an optimization method but does not provide specific software or library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that were used for implementation. |
| Experiment Setup | Yes | We employ Adamax (Kingma and Ba 2014), a widely-used optimization method to iteratively train the ODE-based model. We train Vid-ODE for 500 epochs with a batch size of 8. The learning rate is set initially as 0.001, then exponentially decaying at a rate of 0.99 per epoch. For hyperparameters of Vid-ODE, we use λdiff = 1.0, λimg = 0.003, and λseq = 0.003. |