Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning
Authors: Zixin Wen, Yuanzhi Li
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we verified that the feature decoupling principle matches the underlying mechanism of contrastive learning in practice.Empirical evidence of our theory. Empirically, we conduct multiple experiments to justify our theoretical statements, and the results indeed matches our theory. We show: When no proper augmentation is applied to the data, the neural network will learn features with dense patterns. As shown in Figure 2, Figure 3 and Figure 4 |
| Researcher Affiliation | Academia | 1University of International Business and Economics, Beijing 2Carnegie Mellon University. Correspondence to: Zixin Wen <zixinw@andrew.cmu.edu>, Yuanzhi Li <yuanzhil@andrew.cmu.edu>. |
| Pseudocode | No | The paper describes the training algorithm in narrative text within Section 2.2 'Training algorithm using SGD' but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement, or mention of code in supplementary materials) for the described methodology. |
| Open Datasets | Yes | Figure 1. The difference between supervised features and contrastive features (in the higher layers of Wide-Res Net 34x5 over CIFAR10).Figure 3. Evidence supporting our theoretical framework: the effects of augmentations on the learned representations of Wide-Res Net 34x5 over CIFAR10 visualized via t-SNE. The differences bewteen features learned under different augmentations shows that the neural networks will indeed learn dense representations if augmentation is not powerful enough.Figure 4. Another evidence supporting our theoretical framework. After adding the color distortion to augmentation, the neurons of Alex Net (2nd to 5th layer) exhibit sparser firing patterns over input images of CIFAR10. |
| Dataset Splits | No | The paper mentions using CIFAR-10 for empirical verification but does not explicitly state the dataset split information (e.g., specific percentages, sample counts, or a citation to a predefined split) used for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We use a single-layer neural net f : Rd1 Rm with Re LU activation as our contrastive learner, where m is the number of neurons. More precisely, it is defined as follows: f(x) = (h1(x), . . . , hm(x)) Rm, hi(x) = Re LU( wi, x bi) Re LU( wi, x bi). We initialize the parameters by w(0) i N(0, σ2 0Id1) and b(0) i = 0, where σ2 0 = Θ( 1 d1poly(d)) is small (and also theoretically friendly). For each iteration t, let η = 1 poly(d) be the learning rate, we update as: w(t+1) i w(t) i η wi Obj(ft). Let m = d1.01 be the number of neurons, τ = polylog(d), and |N| = poly(d) be the number of negative samples. |