Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning
Authors: Chongjian GE, Youwei Liang, YIBING SONG, Jianbo Jiao, Jue Wang, Ping Luo
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several standard visual recognition benchmarks, including image classification, object detection, and semantic segmentation, show that the proposed CARE framework improves CNN encoder backbones to the state-of-the-art performance. |
| Researcher Affiliation | Collaboration | 1The University of Hong Kong 2Tencent AI Lab 3University of Oxford |
| Pseudocode | No | The paper describes the method using text and diagrams but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/ChongjianGE/CARE |
| Open Datasets | Yes | The training images we use during pretext training are from the Image Net-1k [50] dataset. ... We use the standard VOC-07, VOC-12, and COCO datasets [20, 37]. |
| Dataset Splits | Yes | The Image Net training set is used for the training and the Image Net validation set is used for evaluation. ... We follow the semi-supervised learning protocol [24, 12] to use 1% and 10% training data (the same data splits as in [12]) during finetuning. |
| Hardware Specification | Yes | We train CARE using 8 Tesla V100 GPUs with a batch size of 1024. |
| Software Dependencies | No | The paper mentions general software components like SGD optimizer and specific network architectures but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The base learning rate is set as 0.05 and scaled linearly with respect to the batch size [23] (i.e., lrbase = 0.05 Batch Size/256). We start the pretext training with a warm-up of 10 epochs where the learning rate rises linearly from 10 6 to the base learning rate (lrbase). Then, we use a cosine decay schedule for the learning rate without restarting it [39, 24] to train the network. The momentum update coefficient of network parameters (denoted as τ) is increased from 0.99 to 1 via a cosine design... We train CARE using 8 Tesla V100 GPUs with a batch size of 1024. |