What Makes Instance Discrimination Good for Transfer Learning?
Authors: Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings are threefold. First, what truly matters for the transfer is low-level and mid-level representations, not high-level representations. Second, the intra-category invariance enforced by the traditional supervised model weakens transferability by increasing task misalignment. Finally, supervised pretraining can be strengthened by following an exemplar-based approach without explicit constraints among the instances within the same category. We study the transfer performance of pretrained models for a set of downstream tasks: object detection on PASCAL VOC07, object detection and instance segmentation on MSCOCO, and semantic segmentation on Cityscapes. |
| Researcher Affiliation | Collaboration | 1City University of Hong Kong 2Microsoft Research Asia |
| Pseudocode | No | The paper describes its methods in prose, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project URL (http://nxzhao.com/projects/good_transfer/), but it does not contain an explicit statement that the source code for the methodology described in this paper is openly released, nor does it link directly to a source code repository like GitHub. |
| Open Datasets | Yes | We study the transfer performance of pretrained models for a set of downstream tasks: object detection on PASCAL VOC07, object detection and instance segmentation on MSCOCO, and semantic segmentation on Cityscapes. The pretraining method Mo Co (He et al., 2020) established a milestone by outperforming the supervised counterpart, with an AP of 46.6compared to 42.4 on PASCAL VOC object detection. ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009. |
| Dataset Splits | Yes | We study the transfer performance of pretrained models for a set of downstream tasks: object detection on PASCAL VOC07, object detection and instance segmentation on MSCOCO, and semantic segmentation on Cityscapes. For the base classes, we split their data into training and validation sets to evaluate base task performance. |
| Hardware Specification | No | The paper mentions running experiments on "8 GPUs" and "4 GPUs" but does not specify the exact models or other hardware details (CPU, RAM, specific machine types, or cloud instances) used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or other libraries used in the implementation. |
| Experiment Setup | Yes | For object detection on PASCAL VOC07, we use the Res Net50-C4 architecture in the Faster R-CNN framework (Ren et al., 2015). Optimization takes 9k iterations on 8 GPUs with a batch size of 2 images per GPU. The learning rate is initialized to 0.02 and decayed to be 10 times smaller after 6k and 8k iterations. For semantic segmentation on Cityscapes, we use the Deep Lab-v3 architecture (Chen et al., 2017) with image crops of 512 by 1024. Optimization takes 40k iterations on 4 GPUs with a batch size of 2 images per GPU. The learning rate is initialized to 0.01 and decayed with a poly schedule. |