Do More Negative Samples Necessarily Hurt In Contrastive Learning?
Authors: Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets. |
| Researcher Affiliation | Industry | Pranjal Awasthi * 1 Nishanth Dikkala * 1 Pritish Kamath * 1 1Google Research, USA. Correspondence to: Nishanth Dikkala <nishanthd@google.com>, Pritish Kamath <pritish@alum.mit.edu>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statement about releasing its own source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets. |
| Dataset Splits | No | The paper states 'provide 5000 (500) train examples per class and 1000 (100) test examples per class respectively', which specifies train and test splits, but it does not mention or quantify a validation dataset split. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware, such as GPU models, CPU types, or cloud computing instance specifications, used to run the experiments. |
| Software Dependencies | No | The paper mentions using a 'Res Net-18/50 architecture', 'logistic loss for training', and 'LARS optimizer', but it does not specify any version numbers for these or other software components or libraries. |
| Experiment Setup | Yes | We train a Res Net-18/50 architecture with a projection head as our encoder... We use the logistic loss for training and we train for 400 epochs... We fix the mini batch size to be 10000, and perform 1000 steps of projected gradient descent with an initial step size of 50. |