Learning a Natural Language Interface with Neural Programmer

Authors: Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main experimental results concern Wiki Table Questions (Pasupat & Liang, 2015), a real-world question-answering dataset, with only 10,000 examples for weak supervision. ... a single Neural Programmer model using minimal text pre-processing, and trained end-to-end, achieves 34.2% accuracy. An ensemble of 15 models, even with a trivial combination technique, achieves 37.7% accuracy.
Researcher Affiliation Collaboration Arvind Neelakantan University of Massachusetts Amherst arvind@cs.umass.edu Quoc V. Le Google Brain qvl@google.com Mart ın Abadi Google Brain abadi@google.com Andrew Mc Callum University of Massachusetts Amherst mccallum@cs.umass.edu Dario Amodei Open AI damodei@openai.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Our code is available at https://github.com/tensorflow/models/tree/master/neural_programmer.
Open Datasets Yes We apply Neural Programmer on the Wiki Table Questions dataset (Pasupat & Liang, 2015)
Dataset Splits Yes We use the train, development, and test split given by Pasupat & Liang (2015). The dataset contains 11321, 2831, and 4344 examples for training, development, and testing respectively.
Hardware Specification Yes Our model is implemented in Tensor Flow (Abadi et al., 2016) and the model takes approximately a day to train on a single Tesla K80 GPU.
Software Dependencies No The paper mentions 'Tensor Flow (Abadi et al., 2016)' as the implementation framework but does not specify its version number or any other software dependencies with specific version numbers.
Experiment Setup Yes We use T = 4 timesteps in our experiments. Words and operations are represented as 256 dimensional vectors, and the hidden vectors of the question and the history RNN are also 256 dimensional. The parameters are initialized uniformly randomly within the range [-0.1, 0.1]. We train the model using the Adam optimizer (Kingma & Ba, 2014) with mini-batches of size 20. The ϵ hyperparameter in Adam is set to 1e-6 while others are set to the default values. Since the training set is small compared to other datasets in which neural network models are usually applied, we rely on strong regularization: We clip the gradients to norm 1 and employ early-stopping. The occurrences of words that appear less than 10 times in the training set are replaced by a single unknown word token. We add a weight decay penalty with strength 0.0001. We use dropout with a keep probability of 0.8 on input and output vectors of the RNN, and selector, operation and column name representations (Srivastava et al., 2014). We use dropout with keep probability of 0.9 on the recurrent connections of the question RNN and history RNN using the technique from Gal & Ghahramani (2016). We use word-dropout (Iyyer et al., 2015) with keep probability of 0.9.