Meta-Learning without Memorization

Authors: Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experimental evaluation, we aim to answer the following questions: (1) How prevalent is the memorization problem across different algorithms and domains? (2) How does the memorization problem affect the performance of algorithms on non-mutually-exclusive task distributions? (3) Is our meta-regularization approach effective for mitigating the problem and is it compatible with multiple types of meta-learning algorithms? (4) Is the problem of memorization empirically distinct from that of the standard overfitting problem? and Table 1: Test MSE for the non-mutually-exclusive sinusoid regression problem.
Researcher Affiliation Collaboration 1UT Austin, 2Google Research, Brain team, 3UC Berkeley, 4Stanford
Pseudocode Yes Algorithm 1: Meta-Regularized CNP, Algorithm 2: Meta-Regularized MAML, Algorithm 3: Meta-Regularized Methods in Meta-testing
Open Source Code Yes Implementation and examples available here: https://github.com/google-research/google-research/tree/master/meta_learning_without_memorization.
Open Datasets Yes Omniglot (Lake et al., 2011) and Mini Imagenet (Ravi & Larochelle, 2016; Vinyals et al., 2016) benchmarks and Pascal 3D data (Xiang et al., 2014).
Dataset Splits Yes We choose the learning rate from {0.0001, 0.0005, 0.001} for each method, β from {10 6, 10 5, , 1} for meta-regularization and report the results with the best hyperparameters (as measured on the meta-validation set) for each method. and In the derivation, we also explicitly consider the splitting of data into the task training set and task validation set, which is aligned with the practical setting.
Hardware Specification No No specific details about the hardware used for experiments (e.g., CPU/GPU models, memory, or specific cloud instances) are provided in the paper.
Software Dependencies No The paper mentions 'Mu Jo Co' but does not provide specific version numbers for any software dependencies used in their experiments.
Experiment Setup Yes We choose the learning rate from {0.0001, 0.0005, 0.001} for each method, β from {10 6, 10 5, , 1} for meta-regularization and report the results with the best hyperparameters (as measured on the meta-validation set) for each method. and We use a meta batch-size of 10 tasks per iteration. and The network is trained using 5 gradient steps with learning rate 0.01 in the inner loop for adaptation and evaluated using 20 gradient steps at the test-time.