Online Meta-Learning via Learning with Layer-Distributed Memory
Authors: Sudarshan Babu, Pedro Savarese, Michael Maire
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that efficient meta-learning can be achieved via end-to-end training of deep neural networks with memory distributed across layers. The persistent state of this memory assumes the entire burden of guiding task adaptation. Moreover, its distributed nature is instrumental in orchestrating adaptation. Ablation experiments demonstrate that providing relevant feedback to memory units distributed across the depth of the network enables them to guide adaptation throughout the entire network. Our results show that this is a successful strategy for simplifying metalearning often cast as a bi-level optimization problem to standard end-to-end training, while outperforming gradient-based, prototype-based, and other memorybased meta-learning strategies. |
| Researcher Affiliation | Academia | Sudarshan Babu TTI-C sudarshan@ttic.edu Pedro Savarese TTI-C savarese@ttic.edu Michael Maire University of Chicago mmaire@uchicago.edu |
| Pseudocode | Yes | Following Santoro et al. [13], we perform episodic training by exposing the model to a variety of tasks from the training distribution P(Ttrain). For a given task, the model incurs a loss Li at every time step of the task; we sum these losses and backpropagate through the sum at the end of the task. This is detailed in Algorithm 1 in Appendix A. We evaluate the model using a partition of the dataset that is class-wise disjoint from the training partition. The model makes a prediction at every time step and adapts to the sequence by using its own hidden states, thereby not requiring any gradient information for adaptation. Algorithm 2 in Appendix A provides details. |
| Open Source Code | No | The paper does not contain any explicit statements about making the source code available or provide a link to a code repository. |
| Open Datasets | Yes | We use CIFAR-FS [46] and Omniglot [47] datasets for our few-shot learning tasks; see Appendix A for details. [47] Brenden Lake. Omniglot git repo, 2015. URL https://github.com/brendenlake/ omniglot/raw/master/python. |
| Dataset Splits | No | The paper mentions evaluating the model using 'a partition of the dataset that is class-wise disjoint from the training partition' and using 'the same class wise disjoint train/test split as in Lake [47]', but it does not provide specific percentages, sample counts, or detailed methodology for training, validation, and test dataset splits within the main text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud configurations) used to run its experiments. |
| Software Dependencies | No | The paper mentions using certain architectural components like Convolutional LSTMs and LSTMs, and optimization methods like Adam, but it does not specify any software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | No | The paper describes the model architecture and general training procedures, such as episodic training and label encoding, and mentions a 'simple curriculum of increasing task length every 5K episodes'. However, it does not provide specific hyperparameter values like learning rate, batch size, or number of epochs in the main text, often deferring such details to the appendices. |