Meta-Learning with Self-Improving Momentum Target

Authors: Jihoon Tack, Jongjin Park, Hankook Lee, Jaeho Lee, Jinwoo Shin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that Si MT brings a significant performance gain when combined with a wide range of meta-learning methods under various applications, including few-shot regression, few-shot classification, and meta-reinforcement learning.
Researcher Affiliation Academia Jihoon Tack1, Jongjin Park1, Hankook Lee1, Jinwoo Shin1 1Korea Advanced Institute of Science and Technology (KAIST) Jaeho Lee2 2Pohang University of Science and Technology (POSTECH)
Pseudocode Yes Algorithm 1 Si MT: Self-Improving Momentum Target
Open Source Code Yes Code is available at https://github.com/jihoontack/Si MT.
Open Datasets Yes For regression tasks, we demonstrate our experiments on Shape Net [13] and Pascal [63] datasets... For few-shot classification tasks, we use the cross-entropy loss for the empirical loss term L to train the meta-model ., i.e., P(x,y)2Q lce(fφ(x), y) where lce is the cross-entropy loss. We train the meta-model on mini-Image Net [55] and tiered-Image Net [38] datasets...
Dataset Splits Yes By following the prior works, we chose the checkpoints and the hyperparameters on the meta-validation set for the few-shot learning tasks [33, 56].
Hardware Specification Yes Computational Resources: All experiments are performed on a single machine with 8 NVIDIA A6000 GPUs.
Software Dependencies No The paper states: "The code is written in PyTorch [40] and learn2learn [1]." However, it does not specify version numbers for these software components, which is necessary for reproducible setup.
Experiment Setup Yes We used Adam optimizer [25] with a learning rate of 1e-3, beta1=0.9 and beta2=0.999. For few-shot regression, we used the ConvNet backbone with 7 layers proposed in [63], same as MAML [10] and Meta SGD [31]. We trained for 100,000 steps with batch size of 2 tasks for Shape Net and 10 tasks for Pascal. For few-shot classification, we used Conv4 [55] and Res Net-12 [34] as backbone networks. We trained for 60,000 steps for Conv4 and 30,000 steps for Res Net-12 on mini-Image Net and tiered-Image Net, with a batch size of 4 tasks. For meta-RL experiments, we used policy network with two hidden layers of size 100 and ReLU activation. We used Adam optimizer with a learning rate of 1e-3.