Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FiLM: Visual Reasoning with a General Conditioning Layer

Authors: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

AAAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Fi LM layers are highly effective for visual reasoning answering image-related questions which require a multi-step, high-level process a task which has proven difﬁcult for standard deep learning methods that do not explicitly model reasoning. Speciﬁcally, we show on visual reasoning tasks that Fi LM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modiﬁcations, and 4) generalize well to challenging, new data from few examples or even zero-shot.
Researcher Affiliation	Academia	Ethan Perez,1,2 Florian Strub,4 Harm de Vries,1 Vincent Dumoulin,1 Aaron Courville1,3 1MILA, Universit e de Montr eal, 2Rice University, 3CIFAR Fellow, 4Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL France EMAIL, ﬂorianEMAIL, EMAIL,EMAIL
Pseudocode	No	The paper describes the model architecture and components in text and with diagrams (Figure 2, Figure 3), but it does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code	Yes	Our code is available at https://github.com/ethanjperez/ﬁlm.
Open Datasets	Yes	CLEVR is a synthetic dataset of 700K (image, question, answer, program) tuples (Johnson et al. 2017a).
Dataset Splits	Yes	CLEVR is a synthetic dataset of 700K (image, question, answer, program) tuples (Johnson et al. 2017a). ... The number of samples is limited 18K for training, 7K for validation, and 7K for testing. ... We employ early stopping based on validation accuracy, training for 80 epochs maximum.
Hardware Specification	Yes	We thank NVIDIA for donating a DGX-1 computer used in this work.
Software Dependencies	No	The paper mentions 'Py Torch (pytorch.org)' as the basis for their implementation and 'Adam (Kingma and Ba 2015)' as the optimizer, but it does not provide specific version numbers for any software libraries or dependencies, such as PyTorch, Python, or CUDA versions.
Experiment Setup	Yes	We train our model end-to-end from scratch with Adam (Kingma and Ba 2015) (learning rate 3e 4), weight decay (1e 5), batch size 64, and batch normalization and Re LU throughout Fi LM-ed network.