Optimal Sample Complexity of Contrastive Learning

Authors: Noga Alon, Dmitrii Avdiukhin, Dor Elboim, Orr Fischer, Grigory Yaroslavtsev

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning. (Abstract) and Experimental results To verify that our results indeed correctly predict the sample complexity, we perform experiments on several popular image datasets: CIFAR-10/100 and MNIST/Fashion MNIST. We find the representations for these images using Res Net18 trained from scratch using various contrastive losses. Our experiments show that for a fixed number of samples, the error rate is well approximated by the value predicted by our theory. We present our findings in Appendix F. (Section 1.1)
Researcher Affiliation Academia 1Princeton University, 2Northwestern University, 3Institute for Advanced Study, 4Weizmann Institute of Science, 5George Mason University
Pseudocode No The paper describes mathematical proofs and algorithms in prose but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing their code for the described methodology, nor does it provide a direct link to such a repository.
Open Datasets Yes we perform experiments on several popular image datasets: CIFAR-10/100 and MNIST/Fashion MNIST. (Section 1.1) and We train the model from scratch on CIFAR-10 (Krizhevsky, 2009) and Fashion-MNIST (Xiao et al., 2017) datasets (Appendix F) and We train the model from scratch on the MNIST (Yann, 1998) and CIFAR-100 (Krizhevsky, 2009) datasets (Appendix F).
Dataset Splits Yes The neural network is trained from scratch for 100 epochs using a set of m {102, 103, 104} training samples, and is evaluated on a different test set of 104 triplets from the same distribution. (Appendix F) and We perform experiments on the training set of CIFAR-10 and the validation set of Image Net by training Res Net-18 from scratch on m {2, 10, 102, 103, 104, 105} randomly sampled triplets, and evaluating the model on the 104 triplets sampled from the same distribution (Appendix F).
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running the experiments are mentioned in the paper.
Software Dependencies Yes We express our thanks to the FFCV library (Leclerc et al., 2022) which allowed us to significantly speed up the execution (Appendix F).
Experiment Setup Yes The neural network is trained from scratch for 100 epochs using a set of m {102, 103, 104} training samples (Appendix F), We train the model from scratch on CIFAR-10 ... using the marginal triplet loss (Schroff et al., 2015b) LMT (x, y+, z ) = max(0, x y+ 2 x z 2 + 1), (Appendix F), and LC(x, y+, z 1 , . . . , z k ) = log exp(x T y+) / (exp(x T y+) + Pk i=1 exp(x T z i )) (Appendix F).