A simple neural network module for relational reasoning

Authors: Adam Santoro, David Raposo, David G. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; textbased question answering using the b Ab I suite of tasks; and complex reasoning about dynamic physical systems.
Researcher Affiliation Industry Adam Santoro adamsantoro@google.com David Raposo draposo@google.com David G.T. Barrett barrettdavid@google.com Mateusz Malinowski mateuszm@google.com Razvan Pascanu razp@google.com Peter Battaglia peterbattaglia@google.com Timothy Lillicrap Deep Mind London, United Kingdom countzero@google.com
Pseudocode No The paper describes the architecture and functions mathematically and in text but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper states that the Sort-of-CLEVR dataset will be made publicly available, but there is no explicit statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets Yes We used two versions of the CLEVR dataset: (i) the pixel version, in which images were represented in standard 2D pixel form. (ii) a state description version, in which images were explicitly represented by state description matrices containing factored object descriptions. The Sort-of-CLEVR dataset will be made publicly available online. Our model was trained on the joint version of b Ab I (all 20 tasks simultaneously), using the full dataset of 10K examples per task.
Dataset Splits Yes Our model was trained on the joint version of b Ab I (all 20 tasks simultaneously), using the full dataset of 10K examples per task. Our model achieved state-of-the-art performance on CLEVR at 95.5%, exceeding the best model trained only on the pixel images and questions at the time of the dataset s publication by 27%, and surpassing human performance in the task (see Table 1 and Figure 3). The model we evaluated was chosen based on overall performance on a withheld validation set, using a single seed.
Hardware Specification No The paper mentions 'distributed training with 10 workers synchronously updating a central parameter server' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'Re LU non-linearities', but it does not specify versions for any programming languages, libraries, or frameworks (e.g., Python version, TensorFlow/PyTorch version).
Experiment Setup Yes For the CLEVR-from-pixels task we used: 4 convolutional layers each with 24 kernels, Re LU non-linearities, and batch normalization; 128 unit LSTM for question processing; 32 unit word-lookup embeddings; four-layer MLP consisting of 256 units per layer with Re LU non-linearities for gθ; and a three-layer MLP consisting of 256, 256 (with 50% dropout), and 29 units with Re LU non-linearities for fφ. The final layer was a linear layer that produced logits for a softmax over the answer vocabulary. The softmax output was optimized with a cross-entropy loss function using the Adam optimizer with a learning rate of 2.5e 4. We used size 64 mini-batches and distributed training with 10 workers synchronously updating a central parameter server.