reproducibilityindex.ai

Generative Video Transformer: Can Objects be the Words?

Authors: Yi-Fu Wu, Jaesik Yoon, Sungjin Ahn

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our model with previous RNNbased approaches as well as other possible video transformer baselines. We demonstrate OCVT performs well when compared to baselines in generating future frames. OCVT also develops useful representations for video reasoning, achieving start-of-the-art performance on the CATER task.
Researcher Affiliation	Collaboration	1Department of Computer Science, Rutgers University 2SAP Labs 3Rutgers Center for Cognitive Science.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (specific repository link, explicit code release statement) for its methodology.
Open Datasets	Yes	We also evaluate on the CATER dataset (Girdhar & Ramanan, 2020), a video-understanding benchmark that requires long-term temporal reasoning.
Dataset Splits	No	The paper specifies training lengths for the bouncing balls dataset (e.g., 'For Mod1, we train on 20 frames'), but it does not provide explicit training, validation, and test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification	No	The paper mentions 'a single 48GB GPU' but does not specify the exact model (e.g., NVIDIA A100) or other specific hardware details like CPU, memory, or cloud instance types used for experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming language versions, library versions, or solver versions.
Experiment Setup	Yes	We then apply the following formula to obtain the predicted bounding box: ˆzwhere t+1 = zwhere t +c tanh( ˆzwhere t+1 ), where c is a hyperparameter between 0 and 1 controlling the maximum update in one timestep. ... βwhere, βdepth, and βpres are hyperparameters used to control the contribution of each loss term.