Compositional Video Synthesis with Action Graphs

Authors: Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train and evaluate AG2Vid on the CATER and Something Something V2 datasets, and show that the resulting videos have better visual quality and semantic consistency compared to baselines.
Researcher Affiliation Collaboration 1The Blavatnik School of Computer Science, Tel Aviv University 2UC San Diego 3UC Berkeley 4NVIDIA Research 5Bar-Ilan University.
Pseudocode No The paper describes the model architecture and processes in narrative text and figures (e.g., Figure 3) but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes See the project page for code and pretrained models: https://roeiherz.github.io/AG2Video.
Open Datasets Yes We use two datasets: (1) CATER (Girdhar & Ramanan, 2020)... (2) Something-Something V2 (Goyal et al., 2017)...
Dataset Splits Yes We employ the standard CATER training partition (3,849 videos) and split the validation into 30% val (495 videos) and use the rest for testing (1,156 videos).
Hardware Specification Yes Models were trained with a batch size of 2 which was the maximal batch size to fit on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions software components and frameworks like ADAM, SPADE generator, and GCNs, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The GCN model uses K = 3 hidden layers and an embedding layer of 128 units for each object and action. For optimization we use ADAM (Kingma & Ba, 2014) with lr = 1e 4 and (β1, β2) = (0.5, 0.99). Models were trained with a batch size of 2... For loss weights we use λB = λF = λP = 10 and λA = 1. We use videos of 8 FPS and 6 FPS for CATER and Smth V2 and evaluate on videos consisting of 16 frames...