Zoo-Tuning: Adaptive Transfer from A Zoo of Models

Authors: Yang Shu, Zhi Kou, Zhangjie Cao, Jianmin Wang, Mingsheng Long

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection. Experiment results demonstrate that the proposed adaptive transfer learning approach can more effectively and efficiently transfer knowledge from a zoo of models.
Researcher Affiliation Academia 1School of Software, BNRist, Tsinghua University, Beijing, China. E-mail: Yang Shu (shuy18@mails.tsinghua.edu.cn).
Pseudocode No The paper describes the method using text and figures, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not contain any statements about releasing its own source code or providing a link to a code repository. It only mentions that
Open Datasets Yes For the zoo of models in the classification setting, we use 5 Res Net-50 models pretrained on representative computer vision datasets: (1) Supervised pretrained model and (2) Unsupervised pretrained model with MOCO (He et al., 2020) on Image Net (Russakovsky et al., 2015), (3) Mask R-CNN (He et al., 2017) model for detection and instance segmentation, (4) Deep Lab V3 (Chen et al., 2018) model for semantic segmentation, and (5) Keypoint R-CNN model for keypoint detection, pretrained on COCO-2017 challenge datasets of each task.
Dataset Splits Yes DMLab contains frames observed by the agent acting in the Deep Mind Lab environment, which are annotated by the distance between the agent and various objects present in the environment. The data are split into 65, 550 training images, 22, 628 validation images and 22, 735 test images.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the experiments. It mentions using the Py Torch framework but no hardware.
Software Dependencies No The paper mentions that
Experiment Setup Yes We use Adam optimizer(Kingma & Ba, 2015) with a learning rate of 1 x 10^-4. [...] We adopt SGD with a learning rate of 0.01 and momentum of 0.9 with the same training strategy (total 15k iterations for fine-tuning with learning rate decay per 6k iterations) for all pretrained models, compared methods and the proposed Zoo-Tuning. We adopt a batch size of 48, and all images are randomly resized and cropped to 224 x 224 as the input of the network. [...] The models are trained for 60 epochs with a batch size of 16. We use Adam optimizer (Kingma & Ba, 2015). The base learning rate is 1 x 10^-4 and is decayed by a rate of 0.1 at the 30-th and 50-th epochs.