What Makes Instance Discrimination Good for Transfer Learning?

Authors: Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our findings are threefold. First, what truly matters for the transfer is low-level and mid-level representations, not high-level representations. Second, the intra-category invariance enforced by the traditional supervised model weakens transferability by increasing task misalignment. Finally, supervised pretraining can be strengthened by following an exemplar-based approach without explicit constraints among the instances within the same category. We study the transfer performance of pretrained models for a set of downstream tasks: object detection on PASCAL VOC07, object detection and instance segmentation on MSCOCO, and semantic segmentation on Cityscapes.
Researcher Affiliation Collaboration 1City University of Hong Kong 2Microsoft Research Asia
Pseudocode No The paper describes its methods in prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper provides a project URL (http://nxzhao.com/projects/good_transfer/), but it does not contain an explicit statement that the source code for the methodology described in this paper is openly released, nor does it link directly to a source code repository like GitHub.
Open Datasets Yes We study the transfer performance of pretrained models for a set of downstream tasks: object detection on PASCAL VOC07, object detection and instance segmentation on MSCOCO, and semantic segmentation on Cityscapes. The pretraining method Mo Co (He et al., 2020) established a milestone by outperforming the supervised counterpart, with an AP of 46.6compared to 42.4 on PASCAL VOC object detection. ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
Dataset Splits Yes We study the transfer performance of pretrained models for a set of downstream tasks: object detection on PASCAL VOC07, object detection and instance segmentation on MSCOCO, and semantic segmentation on Cityscapes. For the base classes, we split their data into training and validation sets to evaluate base task performance.
Hardware Specification No The paper mentions running experiments on "8 GPUs" and "4 GPUs" but does not specify the exact models or other hardware details (CPU, RAM, specific machine types, or cloud instances) used for the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or other libraries used in the implementation.
Experiment Setup Yes For object detection on PASCAL VOC07, we use the Res Net50-C4 architecture in the Faster R-CNN framework (Ren et al., 2015). Optimization takes 9k iterations on 8 GPUs with a batch size of 2 images per GPU. The learning rate is initialized to 0.02 and decayed to be 10 times smaller after 6k and 8k iterations. For semantic segmentation on Cityscapes, we use the Deep Lab-v3 architecture (Chen et al., 2017) with image crops of 512 by 1024. Optimization takes 40k iterations on 4 GPUs with a batch size of 2 images per GPU. The learning rate is initialized to 0.01 and decayed with a poly schedule.