Learning by Abstraction: The Neural State Machine

Authors: Drew Hudson, Christopher D. Manning

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on VQA-CP and GQA, two recent VQA datasets that involve compositionality, multi-step inference and diverse reasoning skills, achieving state-of-the-art results in both cases. We provide further experiments that illustrate the model s strong generalization capacity across multiple dimensions, including novel compositions of concepts, changes in the answer distribution, and unseen linguistic structures, demonstrating the qualities and efficacy of our approach.
Researcher Affiliation Academia Drew A. Hudson Stanford University 353 Serra Mall, Stanford, CA 94305 dorarad@cs.stanford.edu Christopher D. Manning Stanford University 353 Serra Mall, Stanford, CA 94305 manning@cs.stanford.edu
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No The model has been implemented in Tensorflow, and will be released along with the features and instructions for reproducing the described experiments.
Open Datasets Yes We evaluate our model (NSM) on two recent VQA datasets: (1) The GQA dataset [41] which focuses on real-world visual reasoning and compositional question answering, and (2) VQA-CP (version 2) [3], a recent split of the VQA dataset [27] that has been particularly designed to test generalization skills across changes in the answer distribution between the training and the test sets.
Dataset Splits No The paper mentions using "validation set" for GQA in Section 6.5, but does not provide specific details on its size, proportion, or how it was split from the main dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper states the model was 'implemented in Tensorflow' but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes Both our model and implemented baselines are trained to minimize the cross-entropy loss of the predicted candidate answer (out of the top 2000 possibilities), using a hidden state size of d = 300 and, unless otherwise stated, length of N = 8 computation steps for the MAC and NSM models. Please refer to section 6.5 for further information about the training procedure, implementation details, hyperparameter configuration and data preprocessing... (from Section 4) and We use the Adam optimizer [47] with an initial learning rate of 1e-4, decaying by 0.2 after every 3 epochs... We train the model for 30 epochs with a batch size of 64. (from Section 6.5)