reproducibilityindex.ai

Joint Contrastive Learning with Infinite Possibilities

Authors: Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate these proposals on multiple benchmarks, demonstrating considerable improvements over existing algorithms. In this section, we empirically evaluate and analyze the hypotheses that directly emanated from the design of JCL. Speciﬁcally, we perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. Following the protocols in [8, 18], we verify the effectiveness of JCL pre-trained features via the following evaluations: 1) Linear classiﬁcation accuracy on Image Net1K. 2) Generalization capability of features when transferred to alternative downstream tasks, including object detection [5, 39], instance segmentation [19] and keypoint detection [19] on the MS COCO [31] dataset. 3) Ablation studies that reveal the effectiveness of each component in our losses. 4) Statistical analysis on features that validates our hypothesis and proposals in the previous sections.
Researcher Affiliation	Collaboration	1 University of Science and Technology of China, Hefei, China 2 JD AI Research, Beijing, China
Pseudocode	Yes	Algorithm 1 summarizes the algorithmic ﬂow of the JCL procedure.
Open Source Code	Yes	Code is publicly available at: https://github.com/caiqi/Joint-Contrastive-Learning.
Open Datasets	Yes	We perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. MS COCO [31] dataset.
Dataset Splits	Yes	We perform the pre-training on Image Net1K [10] dataset that contains 1.2M images evenly distributed across 1,000 classes. Following the protocols in [8, 18], we verify the effectiveness of JCL pre-trained features via the following evaluations: 1) Linear classiﬁcation accuracy on Image Net1K. For the hyper-parameters, we use positive key number M = 5, softmax temperature τ = 0.2 and λ = 4.0 in Eq.(8)... We train JCL for 200 epochs with an initial learning rate of lr = 0.06 and lr is gradually annealed following a cosine decay schedule [32]. The classiﬁer is trained for 100 epochs, while the learning rate lr is decayed by 0.1 at the 60th and the 80th epoch respectively.
Hardware Specification	Yes	This queuing trick also allows for feasible training on a typical 8-GPU machine and achieves state-of-the-art learning performances. The batch size is set to N = 512 that enables applicable implementations on an 8-GPU machine. The training is performed on a 4-GPU machine and each GPU carries 4 images at a time.
Software Dependencies	No	The paper mentions software components and models like ResNet-50, Faster R-CNN, and FPN. However, it does not specify version numbers for these or any other software dependencies (e.g., PyTorch version, CUDA version).
Experiment Setup	Yes	For the hyper-parameters, we use positive key number M = 5, softmax temperature τ = 0.2 and λ = 4.0 in Eq.(8)... The dimension of this embedding is d = 128 across all experiments. The batch size is set to N = 512 that enables applicable implementations on an 8-GPU machine. We train JCL for 200 epochs with an initial learning rate of lr = 0.06 and lr is gradually annealed following a cosine decay schedule [32]. The batch size is set as N = 256 and the learning rate lr = 30 at this stage... The classiﬁer is trained for 100 epochs, while the learning rate lr is decayed by 0.1 at the 60th and the 80th epoch respectively. We train all models for 90k iterations, which is commonly referred to as the 1 schedule in [18]. We vary the number of positive keys used for the estimate of µk+ i and Σk+ i. We vary λ in the range of [0.0, 10.0]. The temperature τ [22] affects the ﬂatness of softmax function and the conﬁdence of each positive pair. From Fig.(2(c)), the optimal τ turns out to be around 0.2.