Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large-Scale Meta-Learning with Continual Trajectory Shifting

Authors: Jaewoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method on a heterogeneous set of large-scale tasks and show that the algorithm largely outperforms the previous first-order metalearning methods in terms of both generalization performance and convergence, as well as multitask learning and fine-tuning baselines.
Researcher Affiliation Collaboration 1Graduate School of AI, KAIST, South Korea 2Google, LA 3AITRICS, South Korea.
Pseudocode Yes Algorithm 1 Previous meta-learning algorithms; Algorithm 2 Meta-learning with continual shifting
Open Source Code Yes The code is also publicly available3. (footnote 3: https://github.com/JWoong148/ Continual Trajectory Shifting)
Open Datasets Yes For meta-training, we use 7 datasets: Tiny Image Net (tin), CIFAR100 (Krizhevsky et al., 2009), Stanford Dogs (Khosla et al., 2011), Aircraft (Maji et al., 2013), CUB (Wah et al., 2011), Fashion-MNIST (Xiao et al., 2017a), and SVHN (Netzer et al., 2011)).
Dataset Splits No The paper does not explicitly specify traditional training/validation/test splits with percentages or sample counts for the datasets used. It distinguishes between datasets used for 'meta-training' and 'meta-testing' and describes how many steps are taken for 'inner-optimization' or how many 'training datapoints' are used for meta-testing, but not a clear validation split from a specific dataset.
Hardware Specification No The paper does not specify any particular GPU or CPU models, memory sizes, or specific cloud computing instances used for running the experiments.
Software Dependencies No The paper mentions software components like 'Res Net20', 'Res Net18', 'SGD with momentum', and 'Nesterov momentum optimizer', but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup Yes Experimental setup: We use α = 0.05, β = 0.1, K = 100, and M = 3. We set the inner-optimizer to SGD with momentun (µ = 0.9). [...] We use α = 0.01, K = 1, 000, and M = 200 for all the baselines and our model, except for β that we found in the range of {10 3, 10 2, 10 1, 100, 101}. We use SGD with momentum (µ = 0.9) and weight decay (λ = 0.0005) as the inner optimizer. For meta-testing, we train K = 1, 000 steps for each dataset. We use SGD with Nesterov momentum optimizer (µ = 0.9) with an appropriate learning rate scheduling. The starting learning rate is α = 0.1 and we use λ = 0.0005.