Reconciling meta-learning and continual learning with online mixtures of tasks
Authors: Ghassen Jerfel, Erin Grant, Tom Griffiths, Katherine A. Heller
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiment: mini Image Net few-shot classification |
| Researcher Affiliation | Academia | Ghassen Jerfel gj47@duke.edu Duke University eringrant@berkeley.edu UC Berkeley Thomas L. Griffiths tomg@princeton.edu Princeton University Katherine Heller kheller@stat.duke.edu Duke University |
| Pseudocode | Yes | Algorithm 1 Stochastic gradient-based EM for finite and infinite mixtures( dataset D, meta-learning rate β, adaptation rate , temperature , initial cluster count L0, meta-batch size J, training batch size N, validation batch size M, adaptation iteration count K, global prior G0) Initialize cluster count L L0 and meta-level parameters (1), . . . , (L) G0 while not converged do Draw tasks T1, . . . , TJ p D(T ) for j in 1, . . . , J do Draw task-specific datapoints, xj1 . . . xj N+M p Tj(x) Draw a parameter initialization for a new cluster from the global prior, (L+1) G0 for in {1, . . . , L, L + 1} do Initialize ˆφ( ) Compute task-specific mode estimate, ˆφ( ) k rφ log p( xj1:N | ˆφ( ) j ) Compute assignment of tasks to clusters, γj E-STEP (xj1:N , ˆφ(1:L) j ) Update each component in 1, . . . , L, ( ) ( )+ M-STEP ({xj N+1:N+M , ˆφ( ) j=1) Summarize { 1, . . . } to update global prior G0 return { (1), . . . } |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We apply Algorithm 3 with Subroutine 2 and L 2 {2, 3, 4, 5} components to the 1-shot and 5-shot, 5-way, mini Image Net few-shot classification benchmarks [53]; Appendix C.2.1 contains additional experimental details. |
| Dataset Splits | No | The paper mentions 'training batch size N, validation batch size M' in Algorithm 1 and 'additional experimental details' in appendices, but does not provide explicit ratios, counts, or methodology for the overall dataset splits for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions hyperparameters like 'meta-learning rate β, adaptation rate , temperature , initial cluster count L0, meta-batch size J, training batch size N, validation batch size M, adaptation iteration count K' in Algorithm 1, but defers their specific values and other experimental setup details to appendices (C.2.1, C.2.2, C.2.3) which are not provided in the main text. |