A contrastive rule for meta-learning
Authors: Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a first set of experiments, we study a supervised meta-optimization problem based on the entire CIFAR-10 image dataset [48]. We find that our meta-learning rule outperforms all three baseline implicit differentiation methods in terms of both evaluation-set and actual generalization (test-set) performance, cf. Tab. 1. The objective of our experiments is twofold. First, we aim to confirm our theoretical results and demonstrate the performance of contrastive meta-learning on standard machine learning benchmarks. |
| Researcher Affiliation | Academia | Nicolas Zucchet Department of Computer Science ETH Zürich nzucchet@inf.ethz.ch Simon Schug Institute of Neuroinformatics University of Zürich & ETH Zürich sschug@ethz.ch Johannes von Oswald Department of Computer Science ETH Zürich voswaldj@ethz.ch Dominic Zhao Institute of Neuroinformatics University of Zürich & ETH Zürich dozhao@ethz.ch João Sacramento Institute of Neuroinformatics University of Zürich & ETH Zürich rjoao@ethz.ch |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/smonsays/contrastive-meta-learning |
| Open Datasets | Yes | As a first set of experiments, we study a supervised meta-optimization problem based on the entire CIFAR-10 image dataset [48]. To that end, we focus on two widely-studied few-shot image classification problems based on mini Image Net [56] and the Omniglot [57] datasets. |
| Dataset Splits | Yes | As the meta-objective we take the cross-entropy loss l evaluated on a held-out dataset Deval. N-way K-shot tasks are created on-the-fly by sampling N classes at random from a fixed pool of classes, and then splitting the data into task-specific learning Dlearn τ (with K examples per class for learning) and evaluation Deval τ sets, used to define the corresponding loss functions Llearn τ and Leval τ . The validation loss is a proxy for the quality of the gradient and the number of steps in the first phase is a proxy for log δ. |
| Hardware Specification | No | Since this is intractable in this case and running full backpropagation-through-learning requires too much memory, we evaluate truncated backpropagation-through-learning (TBPTL) with the maximal truncation window we can fit on a single graphics processing unit (in our case 200 out of 5000 steps). |
| Software Dependencies | No | The main paper does not provide specific software dependencies with version numbers. It refers to supplementary material for 'Complete hyperparameters and training details'. |
| Experiment Setup | No | The main paper states 'Complete hyperparameters and training details are provided in the SM.' but does not list specific hyperparameter values or detailed training configurations within the main text. |