Integrating Prior Knowledge in Contrastive Learning with Kernel

Authors: Benoit Dufumier, Carlo Alberto Barbano, Robin Louiset, Edouard Duchesnay, Pietro Gori

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an unsupervised setting, we empirically demonstrate that CL benefits from generative models to improve its representation both on natural and medical images. In a weakly supervised scenario, our framework outperforms other unconditional and conditional CL approaches. Source code is available at this https URL. ... We empirically show that our framework performs competitively with small batch size and benefits from the latest advances of generative models to learn a better representation than existing CL methods. 4. We show that we achieve SOTA results in the unsupervised and weakly supervised setting. ... 4. Experiments ... We empirically demonstrate the benefits of removing the coupling between positives and negatives in the original uniformity term in Info NCE loss (Wang & Isola, 2020) in Table 1 and 2. We compare our approach with baseline Info NCE (Oord et al., 2019) and DC (Yeh et al., 2022).
Researcher Affiliation Academia 1Neuro Spin, CEA, Universit e Paris-Saclay 2LTCI, T el ecom Paris, IPParis 3University of Turin.
Pseudocode Yes Algorithm 1 Pseudo-code for computing ˆLde unif ... Algorithm 2 Py Torch implementation of ˆLde unif with kernel
Open Source Code Yes Source code is available at this https URL.
Open Datasets Yes CIFAR (Krizhevsky et al., 2009) We use the original training/test split with 50000 and 10000 images respectively of size 32 32. STL-10 (Coates et al., 2011) ... CUB200-2011 (Wah et al., 2011) ... UTZappos (Yu & Grauman, 2014) ... Image Net100 (Deng et al., 2009; Tian et al., 2020) ... BHB (Dufumier et al., 2021) ... BIOBD (Hozer et al., 2021) ... Che Xpert (Irvin et al., 2019) ... Rand Bits-CIFAR10 (Chen et al., 2021).
Dataset Splits Yes CIFAR (Krizhevsky et al., 2009) We use the original training/test split with 50000 and 10000 images respectively of size 32 32. ... STL-10 (Coates et al., 2011) In unsupervised pre-training, we use all labelled+unlabelled images (105000 images) for training and the remaining 8000 for test with size 96 96. During linear evaluation, we only use the 5000 training labelled images for learning the weights. ... Che Xpert (Irvin et al., 2019) ... We use the official training set for our experiments, following (Huang et al., 2021; Irvin et al., 2019) and we test the models on the hold-out official validation split containing radiographs from 200 patients. ... For BIOBD, using a 5-fold leave-site-out CV
Hardware Specification Yes It notably allows a reasonable computational time since we runt all our experiments on a single server node with 4 V100 GPU.
Software Dependencies No We provide a Py Torch implementation of previous pseudo-code in Algorithm 2. ... For VAE, we use Py Torch-lightning pre-trained model for STL-10... We use SGD optimizer... For Image Net100, we use a LARS (You et al., 2017) optimizer... For DCGAN, we optimize it using Adam optimizer. No specific version numbers mentioned for PyTorch or PyTorch-Lightning or any other library.
Experiment Setup Yes Batch size. We always use a default batch size 256 for all experiments on vision datasets and 64 for brain MRI datasets... Optimization. We use SGD optimizer on small-scale vision datasets... with a base learning rate 0.3 batch size/256 and a cosine scheduler. For Image Net100, we use a LARS (You et al., 2017) optimizer with learning rate 0.02 batch size and cosine scheduler. In Kernel Decoupled Uniformity loss, we set λ = 0.01 batch size and t = 2. For Sim CLR, we set the temperature to τ = 0.07 for all datasets following (Yeh et al., 2022). Unless mentioned otherwise, we use 2 views for Decoupled Uniformity (both with and without kernel)... Training epochs. By default, we train the models for 400 epochs, unless mentioned otherwise... For linear evaluation... we cross-validate an ℓ2 penalty term between {0, 1e 2, 1e 3, 1e 4, 1e 5} for training this linear probe for 300 epochs with an initial learning rate 0.1 decayed by 0.1 at each plateau.