Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers

Authors: Johann Schmidt, Sebastian Stober

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our method on several benchmark datasets, including a synthesised Image Net testset. ITSoutperformsthe utilised baselines on all zero-shot test scenarios.
Researcher Affiliation Academia 1Artificial Intelligence Lab, Otto-von-Guericke University, Magdeburg, Germany. Correspondence to: Johann Schmidt <johann.schmidt@ovgu.de>.
Pseudocode Yes Supplementary to the descriptions and illustrations in Figure 4 and Figure 5, we provide the pseudocode of our proposed algorithm in Algorithm 1.
Open Source Code Yes More details can be found in our publicly available source code.2 2www.github.com/joh Schm/ITS
Open Datasets Yes We evaluated our method on several benchmark datasets, including a synthesised Image Net testset. [...] We trained a CNN, a GCNN (Cohen & Welling, 2016) and a Rot DCF (Cheng et al., 2018) on the vanilla (canonical) MNIST. [...] Si-Score (short SI) (Djolonga et al., 2021) is a synthetic vision dataset for robustness testing, comprising semantically masked Image Net (Russakovsky et al., 2015) objects
Dataset Splits Yes We split the vanilla datasets into disjunct training, validation, and test sets. We always employ the vanilla training set to fit the model and validate it on the vanilla validation set.
Hardware Specification Yes All experiments are performed on an Nvidia A40 GPU (48GB) node with 1 TB RAM, 2x 24core AMD EPYC 74F3 CPU @ 3.20GHz, and a local SSD (NVMe).
Software Dependencies No The software specifications of our implementations can be found in our open-sourced code.
Experiment Setup Yes If not further specified, we used zero-padding to define areas outside the pixel space Ω, bilinear interpolation, and a group cardinality of n = 17. [...] These models are trained with the Adam W optimizer (Loshchilov & Hutter, 2017) using default parameters. We minimised the negative log-likelihood using ground-truth image labels. We used a learning rate of 5e 3, 3 epochs for MNIST, 5 epochs for Fashion-MNIST, 10 epochs for GTSRB and mini-batches of size 128.