Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace

Authors: Yoonho Lee, Seungjin Choi

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed experiments to answer: Do our novel components (TW, M etc) improve metalearning performance? (6.1) Is applying a mask M row-wise actually better than applying one parameter-wise? (6.1) To what degree does T alleviate the need for careful tuning of step size α? (6.2) In MT-nets, does learned subspace dimension reflect the difficulty of tasks? (6.3) Can T-nets and MT-nets scale to large-scale metalearning problems? (6.4)
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Pohang University of Science and Technology, Korea.
Pseudocode Yes Algorithm 1 Transformation Networks (T-net); Algorithm 2 Mask Transformation Networks (MT-net)
Open Source Code No The paper mentions 'Most of our experiments were performed by modifying the code accompanying (Finn et al., 2017)', but it does not provide a link or explicit statement about the availability of their own source code.
Open Datasets Yes To compare the performance of MT-nets to prior work in meta-learning, we evaluate our method on few-shot classification on the Omniglot (Lake et al., 2015) and Mini Imagenet (Ravi & Larochelle, 2017) datasets.
Dataset Splits No The paper describes training and testing examples per task ('Each task consists of K {5, 10, 20} training examples and 10 testing examples') but does not explicitly mention a distinct validation dataset split with specific percentages or counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2015)' as a meta-optimizer, but it does not specify software components or libraries with version numbers.
Experiment Setup Yes We used Adam (Kingma & Ba, 2015) as our meta-optimizer with a learning rate of β = 10 3. Taskspecifc learners used step size α = 10 2. We initialize all ζ to 0, all T as identity matrices, and all W as truncated normal matrices with standard deviation 10 2.