Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning

Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie Jegelka

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that merely combining our equivariant loss with a non-collapse term results in non-trivial representations, without requiring invariance to data augmentations. Optimal performance is achieved by also encouraging approximate invariance, where input augmentations correspond to small rotations. Our method, CARE: Contrastive Augmentationinduced Rotational Equivariance, leads to improved performance on downstream tasks, and ensures sensitivity in embedding space to important variations in data (e.g., color) that standard contrastive methods do not achieve. Code is available at https://github.com/Sharut/CARE.
Researcher Affiliation Academia Sharut Gupta* MIT CSAIL sharut@mit.edu Joshua Robinson* MIT CSAIL joshrob@mit.edu Derek Lim MIT CSAIL dereklim@mit.edu Soledad Villar Johns Hopkins University svillar3@jhu.edu Stefanie jegelka MIT CSAIL stefje@mit.edu
Pseudocode Yes Algorithm 1 presents pytorch-based pseudocode for implementing CARE.
Open Source Code Yes Code is available at https://github.com/Sharut/CARE.
Open Datasets Yes We consider the problem of learning representations of proteins from the Protein Data Bank (Burley et al., 2021). ... We compare Res Net-18 models pretrained with CARE and with Sim CLR on CIFAR10. ... We train Res Net-50 models on four datasets: CIFAR10, CIFAR100, STL10, and Image Net100 ... We consider an image retrieval task on the Flowers-102 dataset (Nilsback & Zisserman, 2008).
Dataset Splits No The paper does not explicitly provide specific percentages or sample counts for training, validation, and test splits. It mentions 'linear probe' evaluation where a linear classifier is trained on 'frozen features', implying standard splits might be used but doesn't detail them. For example, it says 'linear classifier on frozen features for 100 epochs' but does not specify a validation set size or strategy.
Hardware Specification Yes All experiments were performed on an HPC computing cluster using 4 NVIDIA Tesla V100 GPUs with 32GB accelerator RAM for a single training run. The CPUs used were Intel Xeon Gold 6248 processors with 40 cores and 384GB RAM.
Software Dependencies No All experiments use the Py Torch deep learning framework (Paszke et al., 2019). No specific version number is provided for PyTorch or any other software dependency, only the framework name and a citation.
Experiment Setup Yes All encoders have Res Net-50 backbones and are trained for 400 epochs with temperature τ = 0.5 for Sim CLR and τ = 0.1 for Mo Co-v2 *. The encoded features have a dimension of 2048 and are further processed by a two-layer MLP projection head, producing an output dimension of 128. A batch size of 256 was used for all datasets. For CIFAR10 and CIFAR100, we employed the Adam optimizer with a learning rate of 1e 3 and weight decay of 1e 6. For STL10, we employed the SGD optimizer with a learning rate of 0.06, utilizing cosine annealing and a weight decay of 5e 4, with 10 warmup steps.