reproducibilityindex.ai

Large-Scale Meta-Learning with Continual Trajectory Shifting

Authors: Jaewoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method on a heterogeneous set of large-scale tasks and show that the algorithm largely outperforms the previous ﬁrst-order metalearning methods in terms of both generalization performance and convergence, as well as multitask learning and ﬁne-tuning baselines.
Researcher Affiliation	Collaboration	1Graduate School of AI, KAIST, South Korea 2Google, LA 3AITRICS, South Korea.
Pseudocode	Yes	Algorithm 1 Previous meta-learning algorithms; Algorithm 2 Meta-learning with continual shifting
Open Source Code	Yes	The code is also publicly available3. (footnote 3: https://github.com/JWoong148/ Continual Trajectory Shifting)
Open Datasets	Yes	For meta-training, we use 7 datasets: Tiny Image Net (tin), CIFAR100 (Krizhevsky et al., 2009), Stanford Dogs (Khosla et al., 2011), Aircraft (Maji et al., 2013), CUB (Wah et al., 2011), Fashion-MNIST (Xiao et al., 2017a), and SVHN (Netzer et al., 2011)).
Dataset Splits	No	The paper does not explicitly specify traditional training/validation/test splits with percentages or sample counts for the datasets used. It distinguishes between datasets used for 'meta-training' and 'meta-testing' and describes how many steps are taken for 'inner-optimization' or how many 'training datapoints' are used for meta-testing, but not a clear validation split from a specific dataset.
Hardware Specification	No	The paper does not specify any particular GPU or CPU models, memory sizes, or specific cloud computing instances used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'Res Net20', 'Res Net18', 'SGD with momentum', and 'Nesterov momentum optimizer', but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup	Yes	Experimental setup: We use α = 0.05, β = 0.1, K = 100, and M = 3. We set the inner-optimizer to SGD with momentun (µ = 0.9). [...] We use α = 0.01, K = 1, 000, and M = 200 for all the baselines and our model, except for β that we found in the range of {10 3, 10 2, 10 1, 100, 101}. We use SGD with momentum (µ = 0.9) and weight decay (λ = 0.0005) as the inner optimizer. For meta-testing, we train K = 1, 000 steps for each dataset. We use SGD with Nesterov momentum optimizer (µ = 0.9) with an appropriate learning rate scheduling. The starting learning rate is α = 0.1 and we use λ = 0.0005.