FiLM: Visual Reasoning with a General Conditioning Layer
Authors: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Fi LM layers are highly effective for visual reasoning answering image-related questions which require a multi-step, high-level process a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that Fi LM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot. |
| Researcher Affiliation | Academia | Ethan Perez,1,2 Florian Strub,4 Harm de Vries,1 Vincent Dumoulin,1 Aaron Courville1,3 1MILA, Universit e de Montr eal, 2Rice University, 3CIFAR Fellow, 4Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL France ethanperez@rice.edu, florian.strub@inria.fr, mail@harmdevries.com,{dumouliv,courvila}@iro.umontreal.ca |
| Pseudocode | No | The paper describes the model architecture and components in text and with diagrams (Figure 2, Figure 3), but it does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Our code is available at https://github.com/ethanjperez/film. |
| Open Datasets | Yes | CLEVR is a synthetic dataset of 700K (image, question, answer, program) tuples (Johnson et al. 2017a). |
| Dataset Splits | Yes | CLEVR is a synthetic dataset of 700K (image, question, answer, program) tuples (Johnson et al. 2017a). ... The number of samples is limited 18K for training, 7K for validation, and 7K for testing. ... We employ early stopping based on validation accuracy, training for 80 epochs maximum. |
| Hardware Specification | Yes | We thank NVIDIA for donating a DGX-1 computer used in this work. |
| Software Dependencies | No | The paper mentions 'Py Torch (pytorch.org)' as the basis for their implementation and 'Adam (Kingma and Ba 2015)' as the optimizer, but it does not provide specific version numbers for any software libraries or dependencies, such as PyTorch, Python, or CUDA versions. |
| Experiment Setup | Yes | We train our model end-to-end from scratch with Adam (Kingma and Ba 2015) (learning rate 3e 4), weight decay (1e 5), batch size 64, and batch normalization and Re LU throughout Fi LM-ed network. |