Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CLEVRER: Collision Events for Video Representation and Reasoning

Authors: Kexin Yi*, Chuang Gan*, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate various state-of-the-art models for visual reasoning on our benchmark.CLEVRER includes 20,000 synthetic videos of colliding objects and more than 300,000 questions and answers (Figure 1).We summarize the performances of all baseline models in Table 2.
Researcher Affiliation	Collaboration	Kexin Yi Harvard University Chuang Gan MIT-IBM Watson AI Lab Yunzhu Li MIT CSAIL Pushmeet Kohli Deep Mind Jiajun Wu MIT CSAIL Antonio Torralba MIT CSAIL Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL
Pseudocode	No	The paper describes model components and algorithms but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The project page URL (http://clevrer.csail.mit.edu/) is provided, but the paper itself does not contain an explicit statement that the source code for the described methodology is publicly released or available, and the linked project page states “CLEVRER Code coming soon!”.
Open Datasets	Yes	The Co Llision Events for Video REpresentation and Reasoning (CLEVRER) dataset is introduced, with a project page: http://clevrer.csail.mit.edu/.CLEVRER includes 10,000 videos for training, 5,000 for validation, and 5,000 for testing.
Dataset Splits	Yes	CLEVRER includes 10,000 videos for training, 5,000 for validation, and 5,000 for testing.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions software and general models used.
Software Dependencies	No	The paper mentions various software components and models like "Bullet (Coumans, 2010) physics engine", "Blender (Blender Online Community, 2016)", and "Adam optimizer (Kingma & Ba, 2015)", but it does not provide specific version numbers for these software dependencies (e.g., "Bullet v2.x" or "Blender 2.79").
Experiment Setup	Yes	We train the model for 30,000 iterations with stochastic gradient decent, using a batch size of 6 and learning rate 0.001.We use the Adam optimizer (Kingma & Ba, 2015) with an initial learning rate of 10 4, and a decay factor of 0.3 per 3 epochs. The model is trained for 9 epochs with batch size 2.All models are trained using Adam (Kingma & Ba, 2015) for 30,000 iterations with batch size 64 and learning rate 7 10 4.