Optimal Sample Complexity of Contrastive Learning
Authors: Noga Alon, Dmitrii Avdiukhin, Dor Elboim, Orr Fischer, Grigory Yaroslavtsev
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning. (Abstract) and Experimental results To verify that our results indeed correctly predict the sample complexity, we perform experiments on several popular image datasets: CIFAR-10/100 and MNIST/Fashion MNIST. We find the representations for these images using Res Net18 trained from scratch using various contrastive losses. Our experiments show that for a fixed number of samples, the error rate is well approximated by the value predicted by our theory. We present our findings in Appendix F. (Section 1.1) |
| Researcher Affiliation | Academia | 1Princeton University, 2Northwestern University, 3Institute for Advanced Study, 4Weizmann Institute of Science, 5George Mason University |
| Pseudocode | No | The paper describes mathematical proofs and algorithms in prose but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing their code for the described methodology, nor does it provide a direct link to such a repository. |
| Open Datasets | Yes | we perform experiments on several popular image datasets: CIFAR-10/100 and MNIST/Fashion MNIST. (Section 1.1) and We train the model from scratch on CIFAR-10 (Krizhevsky, 2009) and Fashion-MNIST (Xiao et al., 2017) datasets (Appendix F) and We train the model from scratch on the MNIST (Yann, 1998) and CIFAR-100 (Krizhevsky, 2009) datasets (Appendix F). |
| Dataset Splits | Yes | The neural network is trained from scratch for 100 epochs using a set of m {102, 103, 104} training samples, and is evaluated on a different test set of 104 triplets from the same distribution. (Appendix F) and We perform experiments on the training set of CIFAR-10 and the validation set of Image Net by training Res Net-18 from scratch on m {2, 10, 102, 103, 104, 105} randomly sampled triplets, and evaluating the model on the 104 triplets sampled from the same distribution (Appendix F). |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running the experiments are mentioned in the paper. |
| Software Dependencies | Yes | We express our thanks to the FFCV library (Leclerc et al., 2022) which allowed us to significantly speed up the execution (Appendix F). |
| Experiment Setup | Yes | The neural network is trained from scratch for 100 epochs using a set of m {102, 103, 104} training samples (Appendix F), We train the model from scratch on CIFAR-10 ... using the marginal triplet loss (Schroff et al., 2015b) LMT (x, y+, z ) = max(0, x y+ 2 x z 2 + 1), (Appendix F), and LC(x, y+, z 1 , . . . , z k ) = log exp(x T y+) / (exp(x T y+) + Pk i=1 exp(x T z i )) (Appendix F). |