A contrastive rule for meta-learning

Authors: Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a first set of experiments, we study a supervised meta-optimization problem based on the entire CIFAR-10 image dataset [48]. We find that our meta-learning rule outperforms all three baseline implicit differentiation methods in terms of both evaluation-set and actual generalization (test-set) performance, cf. Tab. 1. The objective of our experiments is twofold. First, we aim to confirm our theoretical results and demonstrate the performance of contrastive meta-learning on standard machine learning benchmarks.
Researcher Affiliation Academia Nicolas Zucchet Department of Computer Science ETH Zürich nzucchet@inf.ethz.ch Simon Schug Institute of Neuroinformatics University of Zürich & ETH Zürich sschug@ethz.ch Johannes von Oswald Department of Computer Science ETH Zürich voswaldj@ethz.ch Dominic Zhao Institute of Neuroinformatics University of Zürich & ETH Zürich dozhao@ethz.ch João Sacramento Institute of Neuroinformatics University of Zürich & ETH Zürich rjoao@ethz.ch
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code available at https://github.com/smonsays/contrastive-meta-learning
Open Datasets Yes As a first set of experiments, we study a supervised meta-optimization problem based on the entire CIFAR-10 image dataset [48]. To that end, we focus on two widely-studied few-shot image classification problems based on mini Image Net [56] and the Omniglot [57] datasets.
Dataset Splits Yes As the meta-objective we take the cross-entropy loss l evaluated on a held-out dataset Deval. N-way K-shot tasks are created on-the-fly by sampling N classes at random from a fixed pool of classes, and then splitting the data into task-specific learning Dlearn τ (with K examples per class for learning) and evaluation Deval τ sets, used to define the corresponding loss functions Llearn τ and Leval τ . The validation loss is a proxy for the quality of the gradient and the number of steps in the first phase is a proxy for log δ.
Hardware Specification No Since this is intractable in this case and running full backpropagation-through-learning requires too much memory, we evaluate truncated backpropagation-through-learning (TBPTL) with the maximal truncation window we can fit on a single graphics processing unit (in our case 200 out of 5000 steps).
Software Dependencies No The main paper does not provide specific software dependencies with version numbers. It refers to supplementary material for 'Complete hyperparameters and training details'.
Experiment Setup No The main paper states 'Complete hyperparameters and training details are provided in the SM.' but does not list specific hyperparameter values or detailed training configurations within the main text.