Max-Margin Contrastive Learning
Authors: Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian8220-8230
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach on standard vision benchmark datasets, demonstrating better performance in unsupervised representation learning over state-of-the-art, while having better empirical convergence properties. |
| Researcher Affiliation | Collaboration | 1Johns Hopkins University, Baltimore, MD 2Massachusetts Institute of Technology, Cambridge, MA 3Mitsubishi Electric Research Labs, Cambridge, MA |
| Pseudocode | Yes | In Algorithm 1, we provide a pseudocode highlighting the key steps in our approach. |
| Open Source Code | Yes | Code : https://github.com/anshulbshah/MMCL |
| Open Datasets | Yes | We present experiments on standard computer vision datasets such as Image Net-1k, Image Net-100, STL-10, CIFAR-100, and UCF-101, demonstrating superior performances against state of the art, while requiring only smaller negative batches. |
| Dataset Splits | Yes | We use the standard ten-fold cross validation using an SVM and report the average performances and their standard deviations. We train this linear layer for 100 epochs. |
| Hardware Specification | Yes | These experiments are done on Image Net-1K with each RTX3090 GPU holding 64 images. |
| Software Dependencies | No | The paper mentions software components like ResNet-50 backbone, MLP, LARS optimizer, and Adam optimizer, but it does not specify version numbers for any libraries or frameworks (e.g., PyTorch, TensorFlow, scikit-learn) required for reproduction. |
| Experiment Setup | Yes | We pretrain our models on Image Net1K (Deng et al. 2009) using the LARS optimizer (You, Gitman, and Ginsburg 2018) with an initial learning rate of 1.2 for 100 epochs. We use the Adam optimizer with a learning rate of 1e-3 as in (Chuang et al. 2020; Robinson et al. 2021). Unless otherwise stated, we use a batch-size of 256 for all Image Net-1K, CIFAR-100, and STL-10 experiments and 128 for Image Net-100 experiments. For CIFAR-100 experiments, we start with a kernel bandwidth σ2 = 0.02 and increase it by a factor of 10 at 75 and 125 epochs. For STL-10 experiments, we use a kernel bandwidth σ2 = 1. We used σ2 = 5 for Image Net experiments. We set the SVM slack regularization C to 100. For the projected gradient descent optimizer for MMCL, we use a maximum of 1000 steps. |