Equivariant Self-Supervised Learning: Encouraging Equivariance in Representations

Authors: Rumen Dangovski, Li Jing, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljacic

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate E-SSL s effectiveness empirically on several popular computer vision benchmarks, e.g. improving Sim CLR to 72.5% linear probe accuracy on Image Net. Furthermore, we demonstrate usefulness of E-SSL for applications beyond computer vision; in particular, we show its utility on regression problems in photonics science.
Researcher Affiliation Collaboration Rumen Dangovski MIT EECS rumenrd@mit.edu; Li Jing Facebook AI Research ljng@fb.com; Charlotte Loh MIT EECS cloh@mit.edu; Seungwook Han MIT-IBM Watson AI Lab sh3264@columbia.edu; Akash Srivastava MIT-IBM Watson AI Lab akashsri@mit.edu; Brian Cheung MIT CSAIL & BCS cheungb@mit.edu; Pulkit Agrawal MIT CSAIL pulkitag@mit.edu; Marin Soljaˇci c MIT Physics soljacic@mit.edu
Pseudocode Yes Algorithm 1 Py Torch-style pseudocode for E-SSL, predicting four-fold rotations.
Open Source Code Yes Our code, datasets and pre-trained models are available at https://github.com/rdangovs/essl to aid further research in E-SSL.
Open Datasets Yes in our experiments on standard computer vision data, such as the small-scale CIFAR-10 (Torralba et al., 2008; Krizhevsky, 2009) and the large-scale Image Net (Deng et al., 2009)
Dataset Splits Yes We report the k NN accuracy in (%) on the validation set.
Hardware Specification No The paper mentions 'HPC and consultation resources' and 'GPU hours' but does not provide specific hardware details such as exact GPU/CPU models or memory amounts.
Software Dependencies No The paper mentions 'PyTorch-style pseudocode' but does not specify version numbers for PyTorch or any other software dependencies required to replicate the experiments.
Experiment Setup Yes Our experiments use the following architectural choices: Res Net-18 backbone (the CIFAR-10 version has kernel size 3, stride 1, padding 1 and there is no max pooling afterwards); 512 batch size (only our baseline Sim Siam model uses batch size 1024); 0.03 base learning rate for the baseline Sim CLR and Sim Siam and 0.06 base learning rate for E-Sim CLR and E-Sim Siam; 800 pre-training epochs; standard cosine decayed learning rate; 10 epochs for the linear warmup; two layer projector with hidden dimension 2048 and output dimension 2048; for Sim Siam a two layer (bottleneck) predictor with hidden dimension 512 whose learning rate is not decayed; the last batch normalization for the projector does not have learnable affine parameters; 0.0005 weight decay value; SGD with momentum 0.9 optimizer. The augmentation is Random Resized Cropping with scale (0.2, 1.0), aspect ratio (3/4, 4/3) and size 32x32, Random horizontal Flips with probability 0.5, Color Jittering (0.4, 0.4, 0.4, 0.1) with probability 0.8 and Grayscale with probability 0.2.