Meta-Learning with Self-Improving Momentum Target
Authors: Jihoon Tack, Jongjin Park, Hankook Lee, Jaeho Lee, Jinwoo Shin
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that Si MT brings a significant performance gain when combined with a wide range of meta-learning methods under various applications, including few-shot regression, few-shot classification, and meta-reinforcement learning. |
| Researcher Affiliation | Academia | Jihoon Tack1, Jongjin Park1, Hankook Lee1, Jinwoo Shin1 1Korea Advanced Institute of Science and Technology (KAIST) Jaeho Lee2 2Pohang University of Science and Technology (POSTECH) |
| Pseudocode | Yes | Algorithm 1 Si MT: Self-Improving Momentum Target |
| Open Source Code | Yes | Code is available at https://github.com/jihoontack/Si MT. |
| Open Datasets | Yes | For regression tasks, we demonstrate our experiments on Shape Net [13] and Pascal [63] datasets... For few-shot classification tasks, we use the cross-entropy loss for the empirical loss term L to train the meta-model ., i.e., P(x,y)2Q lce(fφ(x), y) where lce is the cross-entropy loss. We train the meta-model on mini-Image Net [55] and tiered-Image Net [38] datasets... |
| Dataset Splits | Yes | By following the prior works, we chose the checkpoints and the hyperparameters on the meta-validation set for the few-shot learning tasks [33, 56]. |
| Hardware Specification | Yes | Computational Resources: All experiments are performed on a single machine with 8 NVIDIA A6000 GPUs. |
| Software Dependencies | No | The paper states: "The code is written in PyTorch [40] and learn2learn [1]." However, it does not specify version numbers for these software components, which is necessary for reproducible setup. |
| Experiment Setup | Yes | We used Adam optimizer [25] with a learning rate of 1e-3, beta1=0.9 and beta2=0.999. For few-shot regression, we used the ConvNet backbone with 7 layers proposed in [63], same as MAML [10] and Meta SGD [31]. We trained for 100,000 steps with batch size of 2 tasks for Shape Net and 10 tasks for Pascal. For few-shot classification, we used Conv4 [55] and Res Net-12 [34] as backbone networks. We trained for 60,000 steps for Conv4 and 30,000 steps for Res Net-12 on mini-Image Net and tiered-Image Net, with a batch size of 4 tasks. For meta-RL experiments, we used policy network with two hidden layers of size 100 and ReLU activation. We used Adam optimizer with a learning rate of 1e-3. |