Learning to Remember Rare Events
Authors: Lukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task. 4 EXPERIMENTS We perform experiments using all three architectures described above. We experiment both on real-world data and on synthetic tasks that give us some insight into the performance and limitations of the memory module. |
| Researcher Affiliation | Collaboration | Łukasz Kaiser Google Brain lukaszkaiser@google.com Ofir Nachum Google Brain ofirnachum@google.com Aurko Roy Georgia Tech aurko@gatech.edu Samy Bengio Google Brain bengio@google.com |
| Pseudocode | Yes | The full operation of the memory module is depicted in Figure 1. |
| Open Source Code | Yes | The source code for the memory module, together with our settings for Omniglot, is available on github1. 1 https://github.com/tensorflow/models/tree/master/learning_to_remember_rare_events |
| Open Datasets | Yes | We evaluate on the well-known one-shot learning task Omniglot, which is the only dataset with explicit one-shot learning evaluation. This dataset is small and does not benefit from life-long learning capability of our module, but we still exceed the best previous results and set new state-of-the-art. |
| Dataset Splits | No | The paper mentions training and testing phases and data splits for specific evaluations (e.g., Omniglot uses 1200 characters for training and remaining for evaluation; WMT test set is split into even/odd lines), but it does not explicitly provide details for a distinct 'validation' dataset split or general train/validation/test splits for reproduction. |
| Hardware Specification | No | The paper mentions running experiments 'on GPUs' but does not provide specific details on the GPU models, CPU types, or other hardware specifications used. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and that source code is available on GitHub, but it does not specify software dependencies with version numbers (e.g., Python, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | In all our experiments we use the Adam optimizer (Kingma & Ba, 2014) and the parameters for the memory module remain unchanged (k = 256, α = 0.1). The parameter t denotes the inverse of softmax temperature and we set it to t = 40 in our experiments. All convolutions use 3x3 filters with 64 channels in the first pair, and 128 in the second. The fully connected layers have dimension 256 and dropout applied between them. (Convolutional Network with Memory) we use a small Extended Neural GPU with 32 channels and memory of size half a million. (Synthetic task) |