Joint Contrastive Learning with Infinite Possibilities
Authors: Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate these proposals on multiple benchmarks, demonstrating considerable improvements over existing algorithms. In this section, we empirically evaluate and analyze the hypotheses that directly emanated from the design of JCL. Specifically, we perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. Following the protocols in [8, 18], we verify the effectiveness of JCL pre-trained features via the following evaluations: 1) Linear classification accuracy on Image Net1K. 2) Generalization capability of features when transferred to alternative downstream tasks, including object detection [5, 39], instance segmentation [19] and keypoint detection [19] on the MS COCO [31] dataset. 3) Ablation studies that reveal the effectiveness of each component in our losses. 4) Statistical analysis on features that validates our hypothesis and proposals in the previous sections. |
| Researcher Affiliation | Collaboration | 1 University of Science and Technology of China, Hefei, China 2 JD AI Research, Beijing, China |
| Pseudocode | Yes | Algorithm 1 summarizes the algorithmic flow of the JCL procedure. |
| Open Source Code | Yes | Code is publicly available at: https://github.com/caiqi/Joint-Contrastive-Learning. |
| Open Datasets | Yes | We perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. MS COCO [31] dataset. |
| Dataset Splits | Yes | We perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. Following the protocols in [8, 18], we verify the effectiveness of JCL pre-trained features via the following evaluations: 1) Linear classification accuracy on Image Net1K. For the hyper-parameters, we use positive key number M = 5, softmax temperature τ = 0.2 and λ = 4.0 in Eq.(8)... We train JCL for 200 epochs with an initial learning rate of lr = 0.06 and lr is gradually annealed following a cosine decay schedule [32]. The classifier is trained for 100 epochs, while the learning rate lr is decayed by 0.1 at the 60th and the 80th epoch respectively. |
| Hardware Specification | Yes | This queuing trick also allows for feasible training on a typical 8-GPU machine and achieves state-of-the-art learning performances. The batch size is set to N = 512 that enables applicable implementations on an 8-GPU machine. The training is performed on a 4-GPU machine and each GPU carries 4 images at a time. |
| Software Dependencies | No | The paper mentions software components and models like ResNet-50, Faster R-CNN, and FPN. However, it does not specify version numbers for these or any other software dependencies (e.g., PyTorch version, CUDA version). |
| Experiment Setup | Yes | For the hyper-parameters, we use positive key number M = 5, softmax temperature τ = 0.2 and λ = 4.0 in Eq.(8)... The dimension of this embedding is d = 128 across all experiments. The batch size is set to N = 512 that enables applicable implementations on an 8-GPU machine. We train JCL for 200 epochs with an initial learning rate of lr = 0.06 and lr is gradually annealed following a cosine decay schedule [32]. The batch size is set as N = 256 and the learning rate lr = 30 at this stage... The classifier is trained for 100 epochs, while the learning rate lr is decayed by 0.1 at the 60th and the 80th epoch respectively. We train all models for 90k iterations, which is commonly referred to as the 1 schedule in [18]. We vary the number of positive keys used for the estimate of µk+ i and Σk+ i. We vary λ in the range of [0.0, 10.0]. The temperature τ [22] affects the flatness of softmax function and the confidence of each positive pair. From Fig.(2(c)), the optimal τ turns out to be around 0.2. |