Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries

Authors: Charlotte Loh, Seungwook Han, Shivchander Sudalairaj, Rumen Dangovski, Kai Xu, Florian Wenzel, Marin Soljacic, Akash Srivastava

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide extensive empirical results and analyses to demonstrate the superior performance of our method in classification performance, uncertainty quantification and transfer learning on new datasets.
Researcher Affiliation Collaboration 1MIT EECS 2MIT-IBM Watson AI Lab 3Amazon (work done outside of Amazon) 4AWS (work done outside of Amazon) 5MIT Physics.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/clott3/multi-sym-ensem
Open Datasets Yes We motivate this with an intuitive example of rotational symmetry on the Image Net (Deng et al., 2009) dataset.
Dataset Splits Yes Following the approach in (Kornblith et al., 2018), we performed hyperparameter tuning for each model-dataset combination and selected the best hyperparameters using a validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software components like 'SGD optimizer' and 'BYOL s augmentation' but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes All contrastive learning models were trained for 800 epochs with a batch size of 4096. For the equivariant models, hm is a 3-layer MLP and λ is fixed to 0.4. After contrastive pre-training, we initialized a linear layer for each backbone and fine-tuned them end-to-end for 100 epochs using the SGD optimizer with a cosine decay learning rate schedule. We conducted a grid search to optimize the learning rate hyperparameter for each downstream task.