Conditional Contrastive Learning with Kernel
Authors: Yao-Hung Hubert Tsai, Tianqin Li, Martin Q. Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, Ruslan Salakhutdinov
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments using weakly supervised, fair, and hard negatives contrastive learning, showing CCL-K outperforms state-of-the-art baselines. We conduct experiments on various conditional contrastive learning frameworks that are discussed in Section 2.2: Section 4.1 for the weakly supervised contrastive learning, Section 4.2 for the fair contrastive learning; and Section 4.3 for the hard-negatives contrastive learning. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2University of Illinois at Urbana-Champaign 3Mohamed bin Zayed University of Artificial Intelligence {yaohungt, tianqinl, qianlim, kunz1, morency, rsalakhu}@cs.cmu.edu {hanzhao}@illinois.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the methods in text and mathematical formulations. |
| Open Source Code | Yes | Code available at: https://github.com/Crazy-Jack/CCLK-release. |
| Open Datasets | Yes | 1) UT-Zappos (Yu and Grauman, 2014): It contains 50, 025 shoe images over 21 shoe categories. 2) CUB (Wah et al., 2011): It contains 11, 788 bird images spanning 200 fine-grain bird species, meanwhile 312 binary attributes are attached to each image. 3) Image Net-100 (Russakovsky et al., 2015): It is a subset of Image Net-1k Russakovsky et al. (2015) dataset, containing 0.12 million images spanning 100 categories. 1) CIFAR10 (Krizhevsky et al., 2009): It contains 60, 000 images spanning 10 classes, e.g. automobile, plane, or dog. We synthetically create Color MNIST dataset, which randomly assigns a continuous RBG color value for the background in each handwritten digit image in the MNIST dataset (Le Cun et al., 1998). The dataset is attributed to (Yu and Grauman, 2014) and available at the link: http://vision.cs.utexas.edu/projects/finegrained/utzap50k. CUB-200-2011 is created by Wah et al. (2011) and is a fine-grained dataset for bird species. It can be downloaded from the link: http://www.vision.caltech.edu/ visipedia/CUB-200-2011.html. CIFAR-10 (Krizhevsky et al., 2009) is an object detection dataset with 60, 000 32 32 images in 10 classes. The dataset can be downloaded at https://www.cs.toronto.edu/~kriz/cifar.html. This dataset is a subset of Image Net-1K dataset, which comes from the Image Net Large Scale Visual Recognition Challenge (ILSVRC) 2012-2017 (Russakovsky et al., 2015). ILSVRC is for non-commercial research and educational purposes and we refer to the Image Net official site for more information: https://www.image-net.org/download.php. |
| Dataset Splits | Yes | We randomly split train-validation images by 7 : 3 ratio, resulting in 35, 017 train data and 15, 008 validation dataset. We follow the original train-validation split, resulting in 5, 994 train images and 5, 794 validation images. We combine the original training and validation set as our training set and use the original test set as our validation set. The resulting training set contains 6, 871 images and the validation set contains 6, 918 images. We use the training and test split from the original dataset. We follow the original MNIST train/test split, resulting in 60,000 training images and 10,000 testing images spanning 10 digit categories. The training split contains 128, 783 images and the test split contains 5, 000 images. |
| Hardware Specification | Yes | It takes a 4 GPU 1080ti Machine 8 hours to finish the pretraining. For the second setting where we train with 512 batch size for 1000 epochs, it takes an DGX-1 machine 48 hours to finish training. We use 128 batch size and train it on 4 1080ti NVIDIA GPUs. All experiments are trained with 200 epochs and require 53 hours of training on DGX machine with 8 Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions using LARS optimizer, Limited-memory BFGS (L-BFGS), and Open AI CLIP model, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In the pre-training stage, on data s training split, we update the parameters in the feature encoder (i.e., gθ ( )s in equation 2) using the contrastive learning objectives e.g., Info NCE (equation 1), Weakly Sup Info NCE (equation 3), or Weakly Sup CCLK (equation 7) . We train 1000 epochs for all experiments with LARS optimizer (base learning rate 1.5 and scale learning rate based on our batch size divided by 256) with batch size 152 on 4 NVIDIA 1080ti GPUs. All experiments are run with 1000 pretraining iterations and 500 L-BFGS fine tuning steps. We use 128 batch size. The first setting, which is reported in the main text, trains contrastive learning with 256 batch size for 400 epochs. For the second setting where we train with 512 batch size for 1000 epochs. We use LARS optimizer for all CCL-K related experiments with base lr=1.5 and base batch size equals 256. All experiments are trained with 200 epochs. |