reproducibilityindex.ai

Learning to Remember Rare Events

Authors: Lukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task. 4 EXPERIMENTS We perform experiments using all three architectures described above. We experiment both on real-world data and on synthetic tasks that give us some insight into the performance and limitations of the memory module.
Researcher Affiliation	Collaboration	Łukasz Kaiser Google Brain lukaszkaiser@google.com Ofir Nachum Google Brain ofirnachum@google.com Aurko Roy Georgia Tech aurko@gatech.edu Samy Bengio Google Brain bengio@google.com
Pseudocode	Yes	The full operation of the memory module is depicted in Figure 1.
Open Source Code	Yes	The source code for the memory module, together with our settings for Omniglot, is available on github1. 1 https://github.com/tensorflow/models/tree/master/learning_to_remember_rare_events
Open Datasets	Yes	We evaluate on the well-known one-shot learning task Omniglot, which is the only dataset with explicit one-shot learning evaluation. This dataset is small and does not benefit from life-long learning capability of our module, but we still exceed the best previous results and set new state-of-the-art.
Dataset Splits	No	The paper mentions training and testing phases and data splits for specific evaluations (e.g., Omniglot uses 1200 characters for training and remaining for evaluation; WMT test set is split into even/odd lines), but it does not explicitly provide details for a distinct 'validation' dataset split or general train/validation/test splits for reproduction.
Hardware Specification	No	The paper mentions running experiments 'on GPUs' but does not provide specific details on the GPU models, CPU types, or other hardware specifications used.
Software Dependencies	No	The paper mentions using the Adam optimizer and that source code is available on GitHub, but it does not specify software dependencies with version numbers (e.g., Python, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	In all our experiments we use the Adam optimizer (Kingma & Ba, 2014) and the parameters for the memory module remain unchanged (k = 256, α = 0.1). The parameter t denotes the inverse of softmax temperature and we set it to t = 40 in our experiments. All convolutions use 3x3 filters with 64 channels in the first pair, and 128 in the second. The fully connected layers have dimension 256 and dropout applied between them. (Convolutional Network with Memory) we use a small Extended Neural GPU with 32 channels and memory of size half a million. (Synthetic task)