reproducibilityindex.ai

Learning to Remember More with Less Memorization

Authors: Hung Le, Truyen Tran, Svetha Venkatesh

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through an extensive set of experiments, we empirically demonstrate the advantages of our solutions over other recurrent architectures, claiming the state-of-the-arts in various sequential modeling tasks.
Researcher Affiliation	Academia	Hung Le, Truyen Tran and Svetha Venkatesh Applied AI Institute, Deakin University, Geelong, Australia {lethai,truyen.tran,svetha.venkatesh}@deakin.edu.au
Pseudocode	Yes	Algorithm 1 Cached Uniform Writing
Open Source Code	No	The paper provides links to repositories for baseline models (DNC and NTM) that they reimplemented or used, but does not provide a link or explicit statement about releasing the source code for their proposed methods (UW and CUW).
Open Datasets	Yes	The chosen benchmark is a pixel-by-pixel image classiﬁcation task on MNIST... The training, validation and testing sizes are 50,000, 10,000 and 10,000, respectively. The datasets used in this experiment are common big datasets where the number of documents is between 120,000 and 1,400,000 with maximum of 4,392 words per document (see Appendix L for further details).
Dataset Splits	Yes	The training, validation and testing sizes are 50,000, 10,000 and 10,000, respectively. Early-stop training is applied if there is no improvement after 5 epochs in the validation set.
Hardware Specification	No	The paper discusses computation time but does not provide specific hardware details (e.g., CPU, GPU, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions optimizers like Adam and RMSprop and refers to reimplementations based on existing GitHub projects, but it does not specify any software versions for libraries, frameworks, or programming languages (e.g., 'Python 3.x', 'PyTorch 1.x').
Experiment Setup	Yes	The training stops after 10,000 iterations of batch size 64. We use Adam optimizer (Kingma & Ba, 2014) with initial learning rate and gradient clipping of {0.001, 0.0001} and {1, 5, 10}, respectively. The controllers are implemented as single layer GRU with 100-dimensional hidden vector. To optimize the models, we use RMSprop with initial learning rate of 0.0001.