Reasoning with Memory Augmented Neural Networks for Language Comprehension
Authors: Tsendsuren Munkhdalai, Hong Yu
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieved the state-of-the-art results showing an absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children s Book Test (CBT) and Who-Did-What (WDW) news article datasets. |
| Researcher Affiliation | Academia | Tsendsuren Munkhdalai & Hong Yu University of Massachusetts Medical School Bedford VAMC |
| Pseudocode | No | The paper describes the proposed approach in detail using mathematical equations and textual descriptions, but it does not provide any pseudocode or algorithm blocks. |
| Open Source Code | Yes | More detail on hyperparameters can be found in our code: https://bitbucket.org/tsendeemts/nse-rc |
| Open Datasets | Yes | We evaluated our models on two large-scale datasets: Childrens Book Test (CBT) (Hill et al., 2015) and Who-Did-What (WDW) (Onishi et al., 2016). |
| Dataset Splits | Yes | Table 1: Statistics of the datasets. train (s): train strict, train (r): train relaxed and cands: candidates. WDW CBT-NE CBT-CN train (s) train (r) dev test train dev test train dev test |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'A pre-trained 300-D Glove 840B vectors (Pennington et al., 2014)' but does not specify version numbers for any other software dependencies, libraries, or frameworks used. |
| Experiment Setup | Yes | We used stochastic gradient descent with an Adam optimizer to train the models. The initial learning rate (lr) was set to 0.0005 for CBT-CN or 0.001 for other tasks. A pre-trained 300-D Glove 840B vectors (Pennington et al., 2014) were used to initialize the word embedding layer2; therefore the embedding layer size is 300. The hidden layer size of the context embedding Bi LSTM nets k = 436. The embeddings for out-of-vocabulary words and the model parameters were randomly initialized from the uniform distribution over [-0.1, 0.1). The gradient clipping threshold was set to 15. The models were regularized by applying 20% dropouts to the embedding layer3. We used the batch size n = 32 for the CBT dataset and n = 25 for the WDW dataset and early stopping with a patience of 1 epoch. |