K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

Authors: Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce a novel method that enables parameter-efficient transfer and multitask learning with deep neural networks. The basic approach is to learn a model patch a small set of parameters that will specialize to each task, instead of finetuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network to perform well on qualitatively different problems (e.g. converting a Single Shot Multi Box Detection (SSD) model into a 1000-class image classification model while reusing 98% of parameters of the SSD feature extractor). Similarly, we show that re-learning existing low-parameter layers (such as depth-wise convolutions) while keeping the rest of the network frozen also improves transfer-learning accuracy significantly. Our approach allows both simultaneous (multi-task) as well as sequential transfer learning. In several multi-task learning problems, despite using much fewer parameters than traditional logits-only fine-tuning, we match singletask performance. ... 5 EXPERIMENTS We evaluate the performance of our method in both transfer and multi-task learning using the image recognition networks Mobilenet V2 (Sandler et al., 2018) and Inception V3 (Szegedy et al., 2016) and a variety of datasets: Image Net (Deng et al., 2009), CIFAR-10/100 (Krizhevsky, 2009), Cars (Krause et al., 2013), Aircraft (Maji et al., 2013), Flowers-102 (Nilsback & Zisserman, 2008) and Places365 (Zhou et al., 2017). An overview of these datasets can be found in Table 1. We also show preliminary results on transfer learning across completely different types of tasks using Mobilenet V2 and Single-Shot Multibox Detector (SSD) (Liu etol., 2016) networks.
Researcher Affiliation Collaboration Pramod Kaushik Mudrarkarta The University of Chicago pramodkm@uchicago.edu Mark Sandler, Andrey Zhmoginov, Andrew Howard Google Inc. {sandler,azhmogin,howarda}@google.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a specific link or explicit statement about the release of its source code.
Open Datasets Yes We use the image recognition networks Mobilenet V2 (Sandler et al., 2018) and Inception V3 (Szegedy et al., 2016) and a variety of datasets: Image Net (Deng et al., 2009), CIFAR-10/100 (Krizhevsky, 2009), Cars (Krause et al., 2013), Aircraft (Maji et al., 2013), Flowers-102 (Nilsback & Zisserman, 2008) and Places365 (Zhou et al., 2017). An overview of these datasets can be found in Table 1.
Dataset Splits No The paper lists various datasets used but does not explicitly provide the training/validation/test splits (e.g., percentages or sample counts) for reproducibility. It mentions 'Multi-task validation accuracy' in Table 4, implying a validation set was used, but no details on its split.
Hardware Specification Yes We use Tensor Flow (Abadi et al., 2015), and NVIDIA P100 and V100 GPUs for our experiments.
Software Dependencies No The paper mentions 'Tensor Flow (Abadi et al., 2015)' but does not provide a specific version number for TensorFlow or any other software dependency.
Experiment Setup Yes Following the standard setup of Mobilenet and Inception we use 224 x 224 images for Mobilenet V2 and 299 x 299 for Inception V3. As a special-case, for Places-365 dataset, we use 256 x 256 images. We use RMSProp optimizer with a learning rate of 0.045 and decay factor 0.98 per 2.5 epochs. ... In our experiments (Appendix B.2, Figure 9) we observed the opposite behavior when fine-tuning only small model patches: the accuracy grows as learning rate increases. ... The learning rate schedule is the same as in Section 5.3.