Understanding Deep Contrastive Learning via Coordinate-wise Optimization

Authors: Yuandong Tian

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, this formulation, named Pairweighed Contrastive Learning (α-CL), when coupled with various regularization terms, yields novel contrastive losses that show comparable (or better) performance in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011).Initial experiments (Sec. 6) show that α-CL gives comparable (or even better) downstream performance in CIFAR10 and STL-10, compared to vanilla Info NCE loss.We evaluate our α-CL framework (Def. 1) in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011) with Res Net18 (He et al., 2016), and compare the downstream performance of multiple losses.
Researcher Affiliation Industry Yuandong Tian Meta AI (FAIR) yuandong@meta.com
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks, nor does it present structured steps formatted like code.
Open Source Code Yes Codes are available 1. 1https://github.com/facebookresearch/luckmatters/tree/main/ssl/real-dataset
Open Datasets Yes Empirically, this formulation... yields novel contrastive losses that show comparable (or better) performance in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011).We evaluate our α-CL framework (Def. 1) in CIFAR10 (Krizhevsky et al., 2009) and STL-10 (Coates et al., 2011) with Res Net18 (He et al., 2016), and compare the downstream performance of multiple losses, with regularizers taking the form of R(α) = P j =i r(αij) with a constraint P j =i αij = 1.Tbl. 2 shows more experiments with different backbones (e.g., Res Net50) and more complicated datasets (e.g., CIFAR-100).
Dataset Splits No The paper mentions using CIFAR10, STL-10, and CIFAR-100 datasets for experiments but does not explicitly provide information on train/validation/test splits (e.g., percentages, sample counts, or specific predefined split references for validation sets).
Hardware Specification No The paper states: 'Code is written in Py Torch and a single modern GPU suffices for the experiments.' This does not provide specific hardware details such as the GPU model, CPU type, or memory specifications.
Software Dependencies No The paper mentions 'Code is written in Py Torch' and states 'All training is performed with Adam (Kingma & Ba, 2014) optimizer,' but it does not specify version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup Yes Table 1: 'Batchsize 128. Top-1 accuracy with linear evaluation protocol. Temperature τ = 0.5 and learning rate is 0.01.' Table 2: 'For Res Net18, learning rate is 0.01; for Res Net50, learning rate is 0.001.' The paper also mentions '100 epochs 300 epochs 500 epochs'.