Zoo-Tuning: Adaptive Transfer from A Zoo of Models
Authors: Yang Shu, Zhi Kou, Zhangjie Cao, Jianmin Wang, Mingsheng Long
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection. Experiment results demonstrate that the proposed adaptive transfer learning approach can more effectively and efficiently transfer knowledge from a zoo of models. |
| Researcher Affiliation | Academia | 1School of Software, BNRist, Tsinghua University, Beijing, China. E-mail: Yang Shu (shuy18@mails.tsinghua.edu.cn). |
| Pseudocode | No | The paper describes the method using text and figures, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not contain any statements about releasing its own source code or providing a link to a code repository. It only mentions that |
| Open Datasets | Yes | For the zoo of models in the classification setting, we use 5 Res Net-50 models pretrained on representative computer vision datasets: (1) Supervised pretrained model and (2) Unsupervised pretrained model with MOCO (He et al., 2020) on Image Net (Russakovsky et al., 2015), (3) Mask R-CNN (He et al., 2017) model for detection and instance segmentation, (4) Deep Lab V3 (Chen et al., 2018) model for semantic segmentation, and (5) Keypoint R-CNN model for keypoint detection, pretrained on COCO-2017 challenge datasets of each task. |
| Dataset Splits | Yes | DMLab contains frames observed by the agent acting in the Deep Mind Lab environment, which are annotated by the distance between the agent and various objects present in the environment. The data are split into 65, 550 training images, 22, 628 validation images and 22, 735 test images. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the experiments. It mentions using the Py Torch framework but no hardware. |
| Software Dependencies | No | The paper mentions that |
| Experiment Setup | Yes | We use Adam optimizer(Kingma & Ba, 2015) with a learning rate of 1 x 10^-4. [...] We adopt SGD with a learning rate of 0.01 and momentum of 0.9 with the same training strategy (total 15k iterations for fine-tuning with learning rate decay per 6k iterations) for all pretrained models, compared methods and the proposed Zoo-Tuning. We adopt a batch size of 48, and all images are randomly resized and cropped to 224 x 224 as the input of the network. [...] The models are trained for 60 epochs with a batch size of 16. We use Adam optimizer (Kingma & Ba, 2015). The base learning rate is 1 x 10^-4 and is decayed by a rate of 0.1 at the 30-th and 50-th epochs. |