Max-Margin Contrastive Learning

Authors: Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian8220-8230

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach on standard vision benchmark datasets, demonstrating better performance in unsupervised representation learning over state-of-the-art, while having better empirical convergence properties.
Researcher Affiliation Collaboration 1Johns Hopkins University, Baltimore, MD 2Massachusetts Institute of Technology, Cambridge, MA 3Mitsubishi Electric Research Labs, Cambridge, MA
Pseudocode Yes In Algorithm 1, we provide a pseudocode highlighting the key steps in our approach.
Open Source Code Yes Code : https://github.com/anshulbshah/MMCL
Open Datasets Yes We present experiments on standard computer vision datasets such as Image Net-1k, Image Net-100, STL-10, CIFAR-100, and UCF-101, demonstrating superior performances against state of the art, while requiring only smaller negative batches.
Dataset Splits Yes We use the standard ten-fold cross validation using an SVM and report the average performances and their standard deviations. We train this linear layer for 100 epochs.
Hardware Specification Yes These experiments are done on Image Net-1K with each RTX3090 GPU holding 64 images.
Software Dependencies No The paper mentions software components like ResNet-50 backbone, MLP, LARS optimizer, and Adam optimizer, but it does not specify version numbers for any libraries or frameworks (e.g., PyTorch, TensorFlow, scikit-learn) required for reproduction.
Experiment Setup Yes We pretrain our models on Image Net1K (Deng et al. 2009) using the LARS optimizer (You, Gitman, and Ginsburg 2018) with an initial learning rate of 1.2 for 100 epochs. We use the Adam optimizer with a learning rate of 1e-3 as in (Chuang et al. 2020; Robinson et al. 2021). Unless otherwise stated, we use a batch-size of 256 for all Image Net-1K, CIFAR-100, and STL-10 experiments and 128 for Image Net-100 experiments. For CIFAR-100 experiments, we start with a kernel bandwidth σ2 = 0.02 and increase it by a factor of 10 at 75 and 125 epochs. For STL-10 experiments, we use a kernel bandwidth σ2 = 1. We used σ2 = 5 for Image Net experiments. We set the SVM slack regularization C to 100. For the projected gradient descent optimizer for MMCL, we use a maximum of 1000 steps.