Reconciling meta-learning and continual learning with online mixtures of tasks

Authors: Ghassen Jerfel, Erin Grant, Tom Griffiths, Katherine A. Heller

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiment: mini Image Net few-shot classification
Researcher Affiliation Academia Ghassen Jerfel gj47@duke.edu Duke University eringrant@berkeley.edu UC Berkeley Thomas L. Griffiths tomg@princeton.edu Princeton University Katherine Heller kheller@stat.duke.edu Duke University
Pseudocode Yes Algorithm 1 Stochastic gradient-based EM for finite and infinite mixtures( dataset D, meta-learning rate β, adaptation rate , temperature , initial cluster count L0, meta-batch size J, training batch size N, validation batch size M, adaptation iteration count K, global prior G0) Initialize cluster count L L0 and meta-level parameters (1), . . . , (L) G0 while not converged do Draw tasks T1, . . . , TJ p D(T ) for j in 1, . . . , J do Draw task-specific datapoints, xj1 . . . xj N+M p Tj(x) Draw a parameter initialization for a new cluster from the global prior, (L+1) G0 for in {1, . . . , L, L + 1} do Initialize ˆφ( ) Compute task-specific mode estimate, ˆφ( ) k rφ log p( xj1:N | ˆφ( ) j ) Compute assignment of tasks to clusters, γj E-STEP (xj1:N , ˆφ(1:L) j ) Update each component in 1, . . . , L, ( ) ( )+ M-STEP ({xj N+1:N+M , ˆφ( ) j=1) Summarize { 1, . . . } to update global prior G0 return { (1), . . . }
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We apply Algorithm 3 with Subroutine 2 and L 2 {2, 3, 4, 5} components to the 1-shot and 5-shot, 5-way, mini Image Net few-shot classification benchmarks [53]; Appendix C.2.1 contains additional experimental details.
Dataset Splits No The paper mentions 'training batch size N, validation batch size M' in Algorithm 1 and 'additional experimental details' in appendices, but does not provide explicit ratios, counts, or methodology for the overall dataset splits for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper mentions hyperparameters like 'meta-learning rate β, adaptation rate , temperature , initial cluster count L0, meta-batch size J, training batch size N, validation batch size M, adaptation iteration count K' in Algorithm 1, but defers their specific values and other experimental setup details to appendices (C.2.1, C.2.2, C.2.3) which are not provided in the main text.