A simple neural network module for relational reasoning
Authors: Adam Santoro, David Raposo, David G. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; textbased question answering using the b Ab I suite of tasks; and complex reasoning about dynamic physical systems. |
| Researcher Affiliation | Industry | Adam Santoro adamsantoro@google.com David Raposo draposo@google.com David G.T. Barrett barrettdavid@google.com Mateusz Malinowski mateuszm@google.com Razvan Pascanu razp@google.com Peter Battaglia peterbattaglia@google.com Timothy Lillicrap Deep Mind London, United Kingdom countzero@google.com |
| Pseudocode | No | The paper describes the architecture and functions mathematically and in text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that the Sort-of-CLEVR dataset will be made publicly available, but there is no explicit statement or link indicating that the source code for their methodology is open-source or publicly available. |
| Open Datasets | Yes | We used two versions of the CLEVR dataset: (i) the pixel version, in which images were represented in standard 2D pixel form. (ii) a state description version, in which images were explicitly represented by state description matrices containing factored object descriptions. The Sort-of-CLEVR dataset will be made publicly available online. Our model was trained on the joint version of b Ab I (all 20 tasks simultaneously), using the full dataset of 10K examples per task. |
| Dataset Splits | Yes | Our model was trained on the joint version of b Ab I (all 20 tasks simultaneously), using the full dataset of 10K examples per task. Our model achieved state-of-the-art performance on CLEVR at 95.5%, exceeding the best model trained only on the pixel images and questions at the time of the dataset s publication by 27%, and surpassing human performance in the task (see Table 1 and Figure 3). The model we evaluated was chosen based on overall performance on a withheld validation set, using a single seed. |
| Hardware Specification | No | The paper mentions 'distributed training with 10 workers synchronously updating a central parameter server' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions the use of 'Adam optimizer' and 'Re LU non-linearities', but it does not specify versions for any programming languages, libraries, or frameworks (e.g., Python version, TensorFlow/PyTorch version). |
| Experiment Setup | Yes | For the CLEVR-from-pixels task we used: 4 convolutional layers each with 24 kernels, Re LU non-linearities, and batch normalization; 128 unit LSTM for question processing; 32 unit word-lookup embeddings; four-layer MLP consisting of 256 units per layer with Re LU non-linearities for gθ; and a three-layer MLP consisting of 256, 256 (with 50% dropout), and 29 units with Re LU non-linearities for fφ. The final layer was a linear layer that produced logits for a softmax over the answer vocabulary. The softmax output was optimized with a cross-entropy loss function using the Adam optimizer with a learning rate of 2.5e 4. We used size 64 mini-batches and distributed training with 10 workers synchronously updating a central parameter server. |