Large-Scale Meta-Learning with Continual Trajectory Shifting

Authors: Jaewoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method on a heterogeneous set of large-scale tasks and show that the algorithm largely outperforms the previous first-order metalearning methods in terms of both generalization performance and convergence, as well as multitask learning and fine-tuning baselines.
Researcher Affiliation Collaboration 1Graduate School of AI, KAIST, South Korea 2Google, LA 3AITRICS, South Korea.
Pseudocode Yes Algorithm 1 Previous meta-learning algorithms; Algorithm 2 Meta-learning with continual shifting
Open Source Code Yes The code is also publicly available3. (footnote 3: https://github.com/JWoong148/ Continual Trajectory Shifting)
Open Datasets Yes For meta-training, we use 7 datasets: Tiny Image Net (tin), CIFAR100 (Krizhevsky et al., 2009), Stanford Dogs (Khosla et al., 2011), Aircraft (Maji et al., 2013), CUB (Wah et al., 2011), Fashion-MNIST (Xiao et al., 2017a), and SVHN (Netzer et al., 2011)).
Dataset Splits No The paper does not explicitly specify traditional training/validation/test splits with percentages or sample counts for the datasets used. It distinguishes between datasets used for 'meta-training' and 'meta-testing' and describes how many steps are taken for 'inner-optimization' or how many 'training datapoints' are used for meta-testing, but not a clear validation split from a specific dataset.
Hardware Specification No The paper does not specify any particular GPU or CPU models, memory sizes, or specific cloud computing instances used for running the experiments.
Software Dependencies No The paper mentions software components like 'Res Net20', 'Res Net18', 'SGD with momentum', and 'Nesterov momentum optimizer', but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup Yes Experimental setup: We use α = 0.05, β = 0.1, K = 100, and M = 3. We set the inner-optimizer to SGD with momentun (µ = 0.9). [...] We use α = 0.01, K = 1, 000, and M = 200 for all the baselines and our model, except for β that we found in the range of {10 3, 10 2, 10 1, 100, 101}. We use SGD with momentum (µ = 0.9) and weight decay (λ = 0.0005) as the inner optimizer. For meta-testing, we train K = 1, 000 steps for each dataset. We use SGD with Nesterov momentum optimizer (µ = 0.9) with an appropriate learning rate scheduling. The starting learning rate is α = 0.1 and we use λ = 0.0005.