FiLM: Visual Reasoning with a General Conditioning Layer

Authors: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that Fi LM layers are highly effective for visual reasoning answering image-related questions which require a multi-step, high-level process a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that Fi LM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.
Researcher Affiliation Academia Ethan Perez,1,2 Florian Strub,4 Harm de Vries,1 Vincent Dumoulin,1 Aaron Courville1,3 1MILA, Universit e de Montr eal, 2Rice University, 3CIFAR Fellow, 4Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL France ethanperez@rice.edu, florian.strub@inria.fr, mail@harmdevries.com,{dumouliv,courvila}@iro.umontreal.ca
Pseudocode No The paper describes the model architecture and components in text and with diagrams (Figure 2, Figure 3), but it does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code Yes Our code is available at https://github.com/ethanjperez/film.
Open Datasets Yes CLEVR is a synthetic dataset of 700K (image, question, answer, program) tuples (Johnson et al. 2017a).
Dataset Splits Yes CLEVR is a synthetic dataset of 700K (image, question, answer, program) tuples (Johnson et al. 2017a). ... The number of samples is limited 18K for training, 7K for validation, and 7K for testing. ... We employ early stopping based on validation accuracy, training for 80 epochs maximum.
Hardware Specification Yes We thank NVIDIA for donating a DGX-1 computer used in this work.
Software Dependencies No The paper mentions 'Py Torch (pytorch.org)' as the basis for their implementation and 'Adam (Kingma and Ba 2015)' as the optimizer, but it does not provide specific version numbers for any software libraries or dependencies, such as PyTorch, Python, or CUDA versions.
Experiment Setup Yes We train our model end-to-end from scratch with Adam (Kingma and Ba 2015) (learning rate 3e 4), weight decay (1e 5), batch size 64, and batch normalization and Re LU throughout Fi LM-ed network.