Meta-Learning without Memorization
Authors: Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experimental evaluation, we aim to answer the following questions: (1) How prevalent is the memorization problem across different algorithms and domains? (2) How does the memorization problem affect the performance of algorithms on non-mutually-exclusive task distributions? (3) Is our meta-regularization approach effective for mitigating the problem and is it compatible with multiple types of meta-learning algorithms? (4) Is the problem of memorization empirically distinct from that of the standard overfitting problem? and Table 1: Test MSE for the non-mutually-exclusive sinusoid regression problem. |
| Researcher Affiliation | Collaboration | 1UT Austin, 2Google Research, Brain team, 3UC Berkeley, 4Stanford |
| Pseudocode | Yes | Algorithm 1: Meta-Regularized CNP, Algorithm 2: Meta-Regularized MAML, Algorithm 3: Meta-Regularized Methods in Meta-testing |
| Open Source Code | Yes | Implementation and examples available here: https://github.com/google-research/google-research/tree/master/meta_learning_without_memorization. |
| Open Datasets | Yes | Omniglot (Lake et al., 2011) and Mini Imagenet (Ravi & Larochelle, 2016; Vinyals et al., 2016) benchmarks and Pascal 3D data (Xiang et al., 2014). |
| Dataset Splits | Yes | We choose the learning rate from {0.0001, 0.0005, 0.001} for each method, β from {10 6, 10 5, , 1} for meta-regularization and report the results with the best hyperparameters (as measured on the meta-validation set) for each method. and In the derivation, we also explicitly consider the splitting of data into the task training set and task validation set, which is aligned with the practical setting. |
| Hardware Specification | No | No specific details about the hardware used for experiments (e.g., CPU/GPU models, memory, or specific cloud instances) are provided in the paper. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co' but does not provide specific version numbers for any software dependencies used in their experiments. |
| Experiment Setup | Yes | We choose the learning rate from {0.0001, 0.0005, 0.001} for each method, β from {10 6, 10 5, , 1} for meta-regularization and report the results with the best hyperparameters (as measured on the meta-validation set) for each method. and We use a meta batch-size of 10 tasks per iteration. and The network is trained using 5 gradient steps with learning rate 0.01 in the inner loop for adaptation and evaluated using 20 gradient steps at the test-time. |