Meta-Learning Deep Energy-Based Memory Models

Authors: Sergey Bartunov, Jack Rae, Simon Osindero, Timothy Lillicrap

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate experimentally that our EBMM approach can build compressed memories for synthetic and natural data, and is capable of associative retrieval that outperforms existing memory systems in terms of the reconstruction error and compression rate. ... In this section we experimentally evaluate EBMM on a number of real-world image datasets.
Researcher Affiliation Industry Sergey Bartunov Deep Mind London, United Kingdom bartunov@google.com Jack W Rae Deep Mind London, United Kingdom jwrae@google.com Simon Osindero Deep Mind London, United Kingdom osindero@google.com Timothy P Lillicrap Deep Mind London, United Kingdom countzero@google.com
Pseudocode Yes C ARCHITECTURE DETAILS ... hidden_size = 1024 input_size = 128 # 128 * (128 1) / 2 + 128 parameters in total dynamic_size = (input_size 1) // 2 state = repeat_batch(zeros(hidden_size)) memory = Linear(input_size, dynamic_size)
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We begin with experiments on the Omniglot dataset (Lake et al., 2015) ... We conducted a similiar study on the CIFAR dataset. ... further investigate the ability of EBMM to handle complex visual datasets by applying the model to 64 64 Image Net. ... A natural question is whether a model trained on one task or dataset strictly overfits to its features or whether it can generalize to similar, but previously unseen tasks. One of the standard experiments to test this ability is transfer from Omniglot to MNIST
Dataset Splits Yes Meta-learning have been performed on the canonical train splits of each dataset and testing on the test splits.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models) used to run its experiments.
Software Dependencies No The paper mentions 'Adam W optimizer (Loshchilov & Hutter, 2017)' but does not specify version numbers for programming languages, libraries, or frameworks used (e.g., Python, TensorFlow, PyTorch).
Experiment Setup Yes In all experiments EBMM used K = 5 read iterations and T = 5 write iterations. ... We train all models using Adam W optimizer (Loshchilov & Hutter, 2017) with learning rate 5 10 5 and weight decay 10 6, all other parameters set to Adam W defaults. We also apply gradient clipping by global norm at 0.05. ... All models were allowed to train for 2 106 gradient updates or 1 week whichever ended first. ... We use an individual learning rate per each writable layer and each of the three loss terms, initialized at 10 4 and learned together with other parameters.