Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Authors: Chongjian GE, Youwei Liang, YIBING SONG, Jianbo Jiao, Jue Wang, Ping Luo

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several standard visual recognition benchmarks, including image classification, object detection, and semantic segmentation, show that the proposed CARE framework improves CNN encoder backbones to the state-of-the-art performance.
Researcher Affiliation Collaboration 1The University of Hong Kong 2Tencent AI Lab 3University of Oxford
Pseudocode No The paper describes the method using text and diagrams but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/ChongjianGE/CARE
Open Datasets Yes The training images we use during pretext training are from the Image Net-1k [50] dataset. ... We use the standard VOC-07, VOC-12, and COCO datasets [20, 37].
Dataset Splits Yes The Image Net training set is used for the training and the Image Net validation set is used for evaluation. ... We follow the semi-supervised learning protocol [24, 12] to use 1% and 10% training data (the same data splits as in [12]) during finetuning.
Hardware Specification Yes We train CARE using 8 Tesla V100 GPUs with a batch size of 1024.
Software Dependencies No The paper mentions general software components like SGD optimizer and specific network architectures but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The base learning rate is set as 0.05 and scaled linearly with respect to the batch size [23] (i.e., lrbase = 0.05 Batch Size/256). We start the pretext training with a warm-up of 10 epochs where the learning rate rises linearly from 10 6 to the base learning rate (lrbase). Then, we use a cosine decay schedule for the learning rate without restarting it [39, 24] to train the network. The momentum update coefficient of network parameters (denoted as τ) is increased from 0.99 to 1 via a cosine design... We train CARE using 8 Tesla V100 GPUs with a batch size of 1024.