Rethinking the Uniformity Metric in Self-Supervised Learning

Authors: Xianghong Fang, Jian Li, Qiang Sun, Benyou Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Integrating this new metric in existing self-supervised learning methods effectively mitigates dimensional collapse and consistently improves their performance on downstream tasks involving CIFAR-10 and CIFAR-100 datasets. Code is available at https://github.com/statsle/Wasserstein SSL. [...] In this section, we integrate the proposed uniformity loss as an auxiliary term into various existing self-supervised methods. We then conduct experiments on CIFAR-10 and CIFAR-100 datasets to demonstrate its effectiveness. [...] Main Results As depicted in Table 2, incorporating W2 as an additional loss consistently yields superior performance compared to models without this loss or those with LU as the additional term.
Researcher Affiliation Collaboration Xianghong Fang The Chinese University of Hong Kong, Shenzhen fangxianghong2@gmail.com Jian Li Tencent AI Lab lijianjack@gmail.com Qiang Sun University of Toronto & MBZUAI qsunstats@gmail.com Benyou Wang The Chinese University of Hong Kong, Shenzhen & SRIBD wangbenyou@cuhk.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes methods in narrative text.
Open Source Code Yes Code is available at https://github.com/statsle/Wasserstein SSL.
Open Datasets Yes Integrating this new metric in existing self-supervised learning methods effectively mitigates dimensional collapse and consistently improves their performance on downstream tasks involving CIFAR-10 and CIFAR-100 datasets. [...] We conduct experiments on CIFAR-10 and CIFAR-100 datasets to demonstrate its effectiveness.
Dataset Splits Yes Evaluation follows a linear evaluation protocol, where models are pre-trained for 500 epochs. Evaluation involves adding a linear classifier and training the classifier for 100 epochs while preserving the learned representations.
Hardware Specification Yes To ensure fair comparisons, all experiments in Section 6 are conducted on a single 1080 GPU.
Software Dependencies No The paper mentions 'Res Net18 (He et al., 2016) as the backbone and a three-layer MLP as the projector' and 'The LARS optimizer (You et al., 2017)' and 'cosine decay learning rate schedule (Loshchilov & Hutter, 2017)'. It also states 'following da Costa et al. (2022), we set the temperature t = 0.2 for all contrastive learning methods'. However, it does not provide specific version numbers for software like PyTorch, TensorFlow, CUDA, or other libraries that would be necessary for reproduction.
Experiment Setup Yes The LARS optimizer (You et al., 2017) is employed with a base learning rate of 0.2, accompanied by a cosine decay learning rate schedule (Loshchilov & Hutter, 2017) for all models. Evaluation follows a linear evaluation protocol, where models are pre-trained for 500 epochs. Evaluation involves adding a linear classifier and training the classifier for 100 epochs while preserving the learned representations. The same augmentation strategy is deployed across all models, encompassing various operations such as color distortion, rotation, and cutout. [...] Regarding the linear decay for weighting the quadratic Wasserstein distance, refer to Table 3 for the parameter settings.