Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning
Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie Jegelka
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that merely combining our equivariant loss with a non-collapse term results in non-trivial representations, without requiring invariance to data augmentations. Optimal performance is achieved by also encouraging approximate invariance, where input augmentations correspond to small rotations. Our method, CARE: Contrastive Augmentationinduced Rotational Equivariance, leads to improved performance on downstream tasks, and ensures sensitivity in embedding space to important variations in data (e.g., color) that standard contrastive methods do not achieve. Code is available at https://github.com/Sharut/CARE. |
| Researcher Affiliation | Academia | Sharut Gupta* MIT CSAIL sharut@mit.edu Joshua Robinson* MIT CSAIL joshrob@mit.edu Derek Lim MIT CSAIL dereklim@mit.edu Soledad Villar Johns Hopkins University svillar3@jhu.edu Stefanie jegelka MIT CSAIL stefje@mit.edu |
| Pseudocode | Yes | Algorithm 1 presents pytorch-based pseudocode for implementing CARE. |
| Open Source Code | Yes | Code is available at https://github.com/Sharut/CARE. |
| Open Datasets | Yes | We consider the problem of learning representations of proteins from the Protein Data Bank (Burley et al., 2021). ... We compare Res Net-18 models pretrained with CARE and with Sim CLR on CIFAR10. ... We train Res Net-50 models on four datasets: CIFAR10, CIFAR100, STL10, and Image Net100 ... We consider an image retrieval task on the Flowers-102 dataset (Nilsback & Zisserman, 2008). |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or sample counts for training, validation, and test splits. It mentions 'linear probe' evaluation where a linear classifier is trained on 'frozen features', implying standard splits might be used but doesn't detail them. For example, it says 'linear classifier on frozen features for 100 epochs' but does not specify a validation set size or strategy. |
| Hardware Specification | Yes | All experiments were performed on an HPC computing cluster using 4 NVIDIA Tesla V100 GPUs with 32GB accelerator RAM for a single training run. The CPUs used were Intel Xeon Gold 6248 processors with 40 cores and 384GB RAM. |
| Software Dependencies | No | All experiments use the Py Torch deep learning framework (Paszke et al., 2019). No specific version number is provided for PyTorch or any other software dependency, only the framework name and a citation. |
| Experiment Setup | Yes | All encoders have Res Net-50 backbones and are trained for 400 epochs with temperature τ = 0.5 for Sim CLR and τ = 0.1 for Mo Co-v2 *. The encoded features have a dimension of 2048 and are further processed by a two-layer MLP projection head, producing an output dimension of 128. A batch size of 256 was used for all datasets. For CIFAR10 and CIFAR100, we employed the Adam optimizer with a learning rate of 1e 3 and weight decay of 1e 6. For STL10, we employed the SGD optimizer with a learning rate of 0.06, utilizing cosine annealing and a weight decay of 5e 4, with 10 warmup steps. |