Emergent Equivariance in Deep Ensembles

Authors: Jan E Gerken, Pan Kessel

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our theoretical insights using detailed numerical experiments. ... We empirically demonstrate the emergent equivariance in three settings: Ising model, Fashion MNIST, and a high-dimensional medical dataset of histological slices.
Researcher Affiliation Collaboration 1Department of Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, SE-412 96 Gothenburg, Sweden 2Prescient Design, Genentech Roche, Basel, Switzerland. Correspondence to: Jan Gerken <gerken@chalmers.se>, Pan Kessel <pan.kessel@roche.com>.
Pseudocode No The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code No The paper does not include an unambiguous statement or a direct link indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets Yes We train convolutional neural networks augmenting the original dataset (Xiao et al., 2017)... We trained ensembles of CNNs on the NCT-CRC-HE-100K dataset (Kather et al., 2018). The references provide further details: 'Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.' and 'Kather, J. N., Halama, N., and Marx, A. 100,000 histological images of human colorectal cancer and healthy tissue. April 2018. doi: 10.5281/zenodo.1214456.'
Dataset Splits Yes In order to make the task more challenging, we only use 10k randomly selected samples, train on 11/12th of this subset and validate on the remaining 1/12th.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'JAX package neural-tangents' and 'pytorch lightning' but does not specify version numbers for these software components.
Experiment Setup Yes We train for 100k steps of full-batch gradient descent with learning rate 0.5 for network widths 128, 512 and 1024 and learning rate 1.0 for network width 2048. ... We use the ADAM optimizer with the standard learning rate of pytorch lightning, i.e., 1e-3. We train for 10 epochs on the augmented dataset. ... We trained the ensembles with the Adam optimizer using a learning rate of 0.001 on batches of size 16.