Compositional Attention Networks for Machine Reasoning
Authors: Drew A. Hudson, Christopher D. Manning
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on the recent CLEVR dataset (Johnson et al., 2016). CLEVR is a synthetic dataset consisting of 700K tuples; each consists of a 3D-rendered image featuring objects of various shapes, colors, materials and sizes, coupled with compositional multi-step questions that measure performance on an array of challenging reasoning skills such as following transitive relations, counting objects and comparing their properties. |
| Researcher Affiliation | Academia | Drew A. Hudson Department of Computer Science Stanford University dorarad@cs.stanford.edu Christopher D. Manning Department of Computer Science Stanford University manning@cs.stanford.edu |
| Pseudocode | No | The paper provides detailed descriptions of the model's architecture and mathematical formulations in equations, but it does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | A Tensor Flow implementation of the network, along with pretrained models will be made publicly available. |
| Open Datasets | Yes | We evaluate our model on the recent CLEVR dataset (Johnson et al., 2016). |
| Dataset Splits | Yes | We evaluate our model on the recent CLEVR dataset (Johnson et al., 2016). |
| Hardware Specification | Yes | The training process takes roughly 10-20 hours on a single Titan X GPU. |
| Software Dependencies | No | The paper mentions using TensorFlow for implementation and GloVE for word embeddings, but it does not provide specific version numbers for these or other software libraries used in the experiments. |
| Experiment Setup | Yes | We use MAC network with p = 12 cells, and train it using Adam (Kingma & Ba, 2014), with learning rate 10 4. We train our model for 10 20 epochs, with batch size 64, and use early stopping based on validation accuracies. During training, the moving averages of all weights of the model are maintained with the exponential decay rate of 0.999. At test time, the moving averages instead of the raw weights are used. We use dropout 0.85, and ELU (Clevert et al., 2015)... |