Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Compositional Video Synthesis with Action Graphs

Authors: Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train and evaluate AG2Vid on the CATER and Something Something V2 datasets, and show that the resulting videos have better visual quality and semantic consistency compared to baselines.
Researcher Affiliation	Collaboration	1The Blavatnik School of Computer Science, Tel Aviv University 2UC San Diego 3UC Berkeley 4NVIDIA Research 5Bar-Ilan University.
Pseudocode	No	The paper describes the model architecture and processes in narrative text and figures (e.g., Figure 3) but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	See the project page for code and pretrained models: https://roeiherz.github.io/AG2Video.
Open Datasets	Yes	We use two datasets: (1) CATER (Girdhar & Ramanan, 2020)... (2) Something-Something V2 (Goyal et al., 2017)...
Dataset Splits	Yes	We employ the standard CATER training partition (3,849 videos) and split the validation into 30% val (495 videos) and use the rest for testing (1,156 videos).
Hardware Specification	Yes	Models were trained with a batch size of 2 which was the maximal batch size to ﬁt on a single NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions software components and frameworks like ADAM, SPADE generator, and GCNs, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The GCN model uses K = 3 hidden layers and an embedding layer of 128 units for each object and action. For optimization we use ADAM (Kingma & Ba, 2014) with lr = 1e 4 and (β1, β2) = (0.5, 0.99). Models were trained with a batch size of 2... For loss weights we use λB = λF = λP = 10 and λA = 1. We use videos of 8 FPS and 6 FPS for CATER and Smth V2 and evaluate on videos consisting of 16 frames...