Joint Contrastive Learning with Infinite Possibilities

Authors: Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate these proposals on multiple benchmarks, demonstrating considerable improvements over existing algorithms. In this section, we empirically evaluate and analyze the hypotheses that directly emanated from the design of JCL. Specifically, we perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. Following the protocols in [8, 18], we verify the effectiveness of JCL pre-trained features via the following evaluations: 1) Linear classification accuracy on Image Net1K. 2) Generalization capability of features when transferred to alternative downstream tasks, including object detection [5, 39], instance segmentation [19] and keypoint detection [19] on the MS COCO [31] dataset. 3) Ablation studies that reveal the effectiveness of each component in our losses. 4) Statistical analysis on features that validates our hypothesis and proposals in the previous sections.
Researcher Affiliation Collaboration 1 University of Science and Technology of China, Hefei, China 2 JD AI Research, Beijing, China
Pseudocode Yes Algorithm 1 summarizes the algorithmic flow of the JCL procedure.
Open Source Code Yes Code is publicly available at: https://github.com/caiqi/Joint-Contrastive-Learning.
Open Datasets Yes We perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. MS COCO [31] dataset.
Dataset Splits Yes We perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. Following the protocols in [8, 18], we verify the effectiveness of JCL pre-trained features via the following evaluations: 1) Linear classification accuracy on Image Net1K. For the hyper-parameters, we use positive key number M = 5, softmax temperature τ = 0.2 and λ = 4.0 in Eq.(8)... We train JCL for 200 epochs with an initial learning rate of lr = 0.06 and lr is gradually annealed following a cosine decay schedule [32]. The classifier is trained for 100 epochs, while the learning rate lr is decayed by 0.1 at the 60th and the 80th epoch respectively.
Hardware Specification Yes This queuing trick also allows for feasible training on a typical 8-GPU machine and achieves state-of-the-art learning performances. The batch size is set to N = 512 that enables applicable implementations on an 8-GPU machine. The training is performed on a 4-GPU machine and each GPU carries 4 images at a time.
Software Dependencies No The paper mentions software components and models like ResNet-50, Faster R-CNN, and FPN. However, it does not specify version numbers for these or any other software dependencies (e.g., PyTorch version, CUDA version).
Experiment Setup Yes For the hyper-parameters, we use positive key number M = 5, softmax temperature τ = 0.2 and λ = 4.0 in Eq.(8)... The dimension of this embedding is d = 128 across all experiments. The batch size is set to N = 512 that enables applicable implementations on an 8-GPU machine. We train JCL for 200 epochs with an initial learning rate of lr = 0.06 and lr is gradually annealed following a cosine decay schedule [32]. The batch size is set as N = 256 and the learning rate lr = 30 at this stage... The classifier is trained for 100 epochs, while the learning rate lr is decayed by 0.1 at the 60th and the 80th epoch respectively. We train all models for 90k iterations, which is commonly referred to as the 1 schedule in [18]. We vary the number of positive keys used for the estimate of µk+ i and Σk+ i. We vary λ in the range of [0.0, 10.0]. The temperature τ [22] affects the flatness of softmax function and the confidence of each positive pair. From Fig.(2(c)), the optimal τ turns out to be around 0.2.