reproducibilityindex.ai

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Authors: Chongjian GE, Youwei Liang, YIBING SONG, Jianbo Jiao, Jue Wang, Ping Luo

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several standard visual recognition benchmarks, including image classification, object detection, and semantic segmentation, show that the proposed CARE framework improves CNN encoder backbones to the state-of-the-art performance.
Researcher Affiliation	Collaboration	1The University of Hong Kong 2Tencent AI Lab 3University of Oxford
Pseudocode	No	The paper describes the method using text and diagrams but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/ChongjianGE/CARE
Open Datasets	Yes	The training images we use during pretext training are from the Image Net-1k [50] dataset. ... We use the standard VOC-07, VOC-12, and COCO datasets [20, 37].
Dataset Splits	Yes	The Image Net training set is used for the training and the Image Net validation set is used for evaluation. ... We follow the semi-supervised learning protocol [24, 12] to use 1% and 10% training data (the same data splits as in [12]) during finetuning.
Hardware Specification	Yes	We train CARE using 8 Tesla V100 GPUs with a batch size of 1024.
Software Dependencies	No	The paper mentions general software components like SGD optimizer and specific network architectures but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The base learning rate is set as 0.05 and scaled linearly with respect to the batch size [23] (i.e., lrbase = 0.05 Batch Size/256). We start the pretext training with a warm-up of 10 epochs where the learning rate rises linearly from 10 6 to the base learning rate (lrbase). Then, we use a cosine decay schedule for the learning rate without restarting it [39, 24] to train the network. The momentum update coefficient of network parameters (denoted as τ) is increased from 0.99 to 1 via a cosine design... We train CARE using 8 Tesla V100 GPUs with a batch size of 1024.