Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation

Authors: Haoxiang Wang, Han Zhao, Bo Li

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we corroborate our theoretical findings by showing that, with proper implementation, MTL is competitive against state-of-the-art GBML algorithms on a set of few-shot image classification benchmarks.
Researcher Affiliation Academia 1University of Illinois at Urbana-Champaign, Urbana, IL, USA. Correspondence to: Haoxiang Wang <hwang264@illinois.edu>, Han Zhao <hanzhao@illinois.edu>, Bo Li <lbo@illinois.edu>.
Pseudocode No The paper describes methods using mathematical formulations and textual explanations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes The code is released at https://github.com/AI-secure/ multi-task-learning
Open Datasets Yes We conduct experiments on a set of widely used benchmarks for few-shot image classification: mini-Image Net, tiered Image Net, CIFAR-FS and FC100. The first two are derivatives of Image Net (Deng et al., 2009), while the last two are derivatives of CIFAR-100 (Krizhevsky, 2009).
Dataset Splits Yes mini-Image Net (Vinyals et al., 2016): It contains 60,000 colored images of 84x84 pixels, with 100 classes (each with 600 images) split into 64 training classes, 16 validation classes and 20 test classes.
Hardware Specification Yes To illustrate this more concretely, we compare the training cost of MTL against Meta Opt Net on a AWS server with 4x Nvidia V100 GPU cards7. The p3.8xlarge instance in AWS EC2: https://aws. amazon.com/ec2/instance-types/p3/
Software Dependencies No The paper mentions software like 'Py Torch', 'scikit-learn', and 'learn2learn package' but does not specify their version numbers.
Experiment Setup Yes Optimization Setup We use RAdam (Liu et al., 2020), a variant of Adam (Kingma & Ba, 2015), as the optimizer for MTL. We adopt a public Py Torch implementation2, and use the default hyper-parameters. Besides, we adopt the Reduce On Plateau learning rate scheduler3 with the early stopping regularization4.