Learning where to learn: Gradient sparsity in meta and continual learning

Authors: Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis. This selective sparsity results in better generalization and less interference in a range of few-shot and continual learning problems. Moreover, we find that sparse learning also emerges in a more expressive model where learning rates are meta-learned.
Researcher Affiliation Collaboration 1Institute of Neuroinformatics, University of Zürich and ETH Zürich 2Mila, University of Montreal & Service Now
Pseudocode Yes A detailed derivation of the mask update, alongside the presentation of the initialization update, can be found in the supplementary material (SM). ... complete pseudocode is provided in the SM.
Open Source Code Yes Source code available at: https://github.com/Johswald/learning_where_to_learn
Open Datasets Yes We apply sparse-MAML to the standard few-shot learning benchmark based on the mini Image Net dataset [47]... We study the three MNIST [30] continual learning problems... Full details as well as additional experiments using the CIFAR-10 [28] dataset are provided in the SM. ... a sequence of 10000 examples from the Omniglot [29], MNIST [30] and Fashion MNIST [59] datasets is presented for online learning
Dataset Splits Yes During meta-learning, the data of a given task τ is split into training and validation datasets, Dt τ and Dv τ, respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU models, or cloud instance types) used for running its experiments. It states, 'All experimental details can be found in the SM.'
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'). It references 'Py Torch: an imperative style, high-performance deep learning library. [43]' but does not state the version used for their work.
Experiment Setup Yes Our experimental setup1 follows refs. [11, 56] unless stated otherwise. In particular, by default, our experimental results are obtained using the standard 4-convolutional-layer neural network (Conv Net) model that has been intensively used to benchmark meta-learning algorithms. As is also conventional, we consider two data regimes: 5-shot 5-way, and 1-shot 5-way... We perform a grid search over the learning rates α0 and γm, and search for best continual learning performance, not sparsity (cf. SM). The remaining hyperparameters are kept to the values provided in [15].