Large-Scale Meta-Learning with Continual Trajectory Shifting
Authors: Jaewoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method on a heterogeneous set of large-scale tasks and show that the algorithm largely outperforms the previous first-order metalearning methods in terms of both generalization performance and convergence, as well as multitask learning and fine-tuning baselines. |
| Researcher Affiliation | Collaboration | 1Graduate School of AI, KAIST, South Korea 2Google, LA 3AITRICS, South Korea. |
| Pseudocode | Yes | Algorithm 1 Previous meta-learning algorithms; Algorithm 2 Meta-learning with continual shifting |
| Open Source Code | Yes | The code is also publicly available3. (footnote 3: https://github.com/JWoong148/ Continual Trajectory Shifting) |
| Open Datasets | Yes | For meta-training, we use 7 datasets: Tiny Image Net (tin), CIFAR100 (Krizhevsky et al., 2009), Stanford Dogs (Khosla et al., 2011), Aircraft (Maji et al., 2013), CUB (Wah et al., 2011), Fashion-MNIST (Xiao et al., 2017a), and SVHN (Netzer et al., 2011)). |
| Dataset Splits | No | The paper does not explicitly specify traditional training/validation/test splits with percentages or sample counts for the datasets used. It distinguishes between datasets used for 'meta-training' and 'meta-testing' and describes how many steps are taken for 'inner-optimization' or how many 'training datapoints' are used for meta-testing, but not a clear validation split from a specific dataset. |
| Hardware Specification | No | The paper does not specify any particular GPU or CPU models, memory sizes, or specific cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Res Net20', 'Res Net18', 'SGD with momentum', and 'Nesterov momentum optimizer', but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python). |
| Experiment Setup | Yes | Experimental setup: We use α = 0.05, β = 0.1, K = 100, and M = 3. We set the inner-optimizer to SGD with momentun (µ = 0.9). [...] We use α = 0.01, K = 1, 000, and M = 200 for all the baselines and our model, except for β that we found in the range of {10 3, 10 2, 10 1, 100, 101}. We use SGD with momentum (µ = 0.9) and weight decay (λ = 0.0005) as the inner optimizer. For meta-testing, we train K = 1, 000 steps for each dataset. We use SGD with Nesterov momentum optimizer (µ = 0.9) with an appropriate learning rate scheduling. The starting learning rate is α = 0.1 and we use λ = 0.0005. |