Understanding Deep Contrastive Learning via Coordinate-wise Optimization
Authors: Yuandong Tian
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, this formulation, named Pairweighed Contrastive Learning (α-CL), when coupled with various regularization terms, yields novel contrastive losses that show comparable (or better) performance in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011).Initial experiments (Sec. 6) show that α-CL gives comparable (or even better) downstream performance in CIFAR10 and STL-10, compared to vanilla Info NCE loss.We evaluate our α-CL framework (Def. 1) in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011) with Res Net18 (He et al., 2016), and compare the downstream performance of multiple losses. |
| Researcher Affiliation | Industry | Yuandong Tian Meta AI (FAIR) yuandong@meta.com |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks, nor does it present structured steps formatted like code. |
| Open Source Code | Yes | Codes are available 1. 1https://github.com/facebookresearch/luckmatters/tree/main/ssl/real-dataset |
| Open Datasets | Yes | Empirically, this formulation... yields novel contrastive losses that show comparable (or better) performance in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011).We evaluate our α-CL framework (Def. 1) in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011) with Res Net18 (He et al., 2016), and compare the downstream performance of multiple losses, with regularizers taking the form of R(α) = P j =i r(αij) with a constraint P j =i αij = 1.Tbl. 2 shows more experiments with different backbones (e.g., Res Net50) and more complicated datasets (e.g., CIFAR-100). |
| Dataset Splits | No | The paper mentions using CIFAR10, STL-10, and CIFAR-100 datasets for experiments but does not explicitly provide information on train/validation/test splits (e.g., percentages, sample counts, or specific predefined split references for validation sets). |
| Hardware Specification | No | The paper states: 'Code is written in Py Torch and a single modern GPU suffices for the experiments.' This does not provide specific hardware details such as the GPU model, CPU type, or memory specifications. |
| Software Dependencies | No | The paper mentions 'Code is written in Py Torch' and states 'All training is performed with Adam (Kingma & Ba, 2014) optimizer,' but it does not specify version numbers for PyTorch or any other software libraries or dependencies. |
| Experiment Setup | Yes | Table 1: 'Batchsize 128. Top-1 accuracy with linear evaluation protocol. Temperature τ = 0.5 and learning rate is 0.01.' Table 2: 'For Res Net18, learning rate is 0.01; for Res Net50, learning rate is 0.001.' The paper also mentions '100 epochs 300 epochs 500 epochs'. |