Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries
Authors: Charlotte Loh, Seungwook Han, Shivchander Sudalairaj, Rumen Dangovski, Kai Xu, Florian Wenzel, Marin Soljacic, Akash Srivastava
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide extensive empirical results and analyses to demonstrate the superior performance of our method in classification performance, uncertainty quantification and transfer learning on new datasets. |
| Researcher Affiliation | Collaboration | 1MIT EECS 2MIT-IBM Watson AI Lab 3Amazon (work done outside of Amazon) 4AWS (work done outside of Amazon) 5MIT Physics. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/clott3/multi-sym-ensem |
| Open Datasets | Yes | We motivate this with an intuitive example of rotational symmetry on the Image Net (Deng et al., 2009) dataset. |
| Dataset Splits | Yes | Following the approach in (Kornblith et al., 2018), we performed hyperparameter tuning for each model-dataset combination and selected the best hyperparameters using a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'SGD optimizer' and 'BYOL s augmentation' but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | All contrastive learning models were trained for 800 epochs with a batch size of 4096. For the equivariant models, hm is a 3-layer MLP and λ is fixed to 0.4. After contrastive pre-training, we initialized a linear layer for each backbone and fine-tuned them end-to-end for 100 epochs using the SGD optimizer with a cosine decay learning rate schedule. We conducted a grid search to optimize the learning rate hyperparameter for each downstream task. |