Neural Stored-program Memory
Authors: Hung Le, Truyen Tran, Svetha Venkatesh
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A wide range of experiments demonstrate that the resulting machines not only excel in classical algorithmic problems, but also have potential for compositional, continual, few-shot learning and question-answering tasks.To validate our proposal, the NTM armed with NSM, namely Neural Universal Turing Machine (NUTM), is tested on a variety of synthetic tasks including algorithmic tasks from Graves et al. (2014), composition of algorithmic tasks and continual procedure learning. For these algorithmic problems, we demonstrate clear improvements of NUTM over NTM. |
| Researcher Affiliation | Academia | Hung Le, Truyen Tran and Svetha Venkatesh Applied AI Institute, Deakin University, Geelong, Australia {lethai,truyen.tran,svetha.venkatesh}@deakin.edu.au |
| Pseudocode | Yes | Algorithm 1 Neural Universal Turing Machine |
| Open Source Code | No | The paper does not include any explicit statement about making its source code available or a link to a code repository. |
| Open Datasets | Yes | To validate our proposal, the NTM armed with NSM, namely Neural Universal Turing Machine (NUTM), is tested on a variety of synthetic tasks including algorithmic tasks from Graves et al. (2014) (...) using the Omniglot dataset to measure few-shot classification accuracy. (...) Following previous works of DNC, we use b Ab I dataset (Weston et al., 2015) to measure the performance of the NUTM with DNC core. |
| Dataset Splits | No | The paper mentions training and testing data for different tasks (e.g., 'For each task, we pick 1,000 longer sequences as test data' for NTM tasks, and 'After 100,000 episodes of training, the models are tested with unseen images from the testing set' for Omniglot). However, it does not explicitly specify exact percentages, sample counts, or detailed methodology for splitting data into train, validation, and test sets across all experiments to guarantee reproducibility. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for experiments, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions software components like 'RMSprop optimizer', 'layer normalization', and 'Gumbel-softmax', but it does not provide specific version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For fair comparison, the controller hidden dimension of NUTM is set smaller to make the total number of parameters of NUTM equivalent to that of NTM. The number of memory heads for both models are always equal and set to the same value as in the original paper (details in App. C). We train the models using RMSprop optimizer with fixed learning rate of 10 −4 and momentum of 0.9. The batch size is 32 and we adopt layer normalization (Lei Ba et al., 2016) to DNC’s layers. The details of hyper-parameters are listed in Table 12. |