Learning to Remember More with Less Memorization
Authors: Hung Le, Truyen Tran, Svetha Venkatesh
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an extensive set of experiments, we empirically demonstrate the advantages of our solutions over other recurrent architectures, claiming the state-of-the-arts in various sequential modeling tasks. |
| Researcher Affiliation | Academia | Hung Le, Truyen Tran and Svetha Venkatesh Applied AI Institute, Deakin University, Geelong, Australia {lethai,truyen.tran,svetha.venkatesh}@deakin.edu.au |
| Pseudocode | Yes | Algorithm 1 Cached Uniform Writing |
| Open Source Code | No | The paper provides links to repositories for baseline models (DNC and NTM) that they reimplemented or used, but does not provide a link or explicit statement about releasing the source code for their proposed methods (UW and CUW). |
| Open Datasets | Yes | The chosen benchmark is a pixel-by-pixel image classification task on MNIST... The training, validation and testing sizes are 50,000, 10,000 and 10,000, respectively. The datasets used in this experiment are common big datasets where the number of documents is between 120,000 and 1,400,000 with maximum of 4,392 words per document (see Appendix L for further details). |
| Dataset Splits | Yes | The training, validation and testing sizes are 50,000, 10,000 and 10,000, respectively. Early-stop training is applied if there is no improvement after 5 epochs in the validation set. |
| Hardware Specification | No | The paper discusses computation time but does not provide specific hardware details (e.g., CPU, GPU, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam and RMSprop and refers to reimplementations based on existing GitHub projects, but it does not specify any software versions for libraries, frameworks, or programming languages (e.g., 'Python 3.x', 'PyTorch 1.x'). |
| Experiment Setup | Yes | The training stops after 10,000 iterations of batch size 64. We use Adam optimizer (Kingma & Ba, 2014) with initial learning rate and gradient clipping of {0.001, 0.0001} and {1, 5, 10}, respectively. The controllers are implemented as single layer GRU with 100-dimensional hidden vector. To optimize the models, we use RMSprop with initial learning rate of 0.0001. |