Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers

Authors: Johann Schmidt, Sebastian Stober

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our method on several benchmark datasets, including a synthesised Image Net testset. ITSoutperformsthe utilised baselines on all zero-shot test scenarios.
Researcher Affiliation Academia 1Artificial Intelligence Lab, Otto-von-Guericke University, Magdeburg, Germany. Correspondence to: Johann Schmidt <EMAIL>.
Pseudocode Yes Supplementary to the descriptions and illustrations in Figure 4 and Figure 5, we provide the pseudocode of our proposed algorithm in Algorithm 1.
Open Source Code Yes More details can be found in our publicly available source code.2 2www.github.com/joh Schm/ITS
Open Datasets Yes We evaluated our method on several benchmark datasets, including a synthesised Image Net testset. [...] We trained a CNN, a GCNN (Cohen & Welling, 2016) and a Rot DCF (Cheng et al., 2018) on the vanilla (canonical) MNIST. [...] Si-Score (short SI) (Djolonga et al., 2021) is a synthetic vision dataset for robustness testing, comprising semantically masked Image Net (Russakovsky et al., 2015) objects
Dataset Splits Yes We split the vanilla datasets into disjunct training, validation, and test sets. We always employ the vanilla training set to fit the model and validate it on the vanilla validation set.
Hardware Specification Yes All experiments are performed on an Nvidia A40 GPU (48GB) node with 1 TB RAM, 2x 24core AMD EPYC 74F3 CPU @ 3.20GHz, and a local SSD (NVMe).
Software Dependencies No The software specifications of our implementations can be found in our open-sourced code.
Experiment Setup Yes If not further specified, we used zero-padding to define areas outside the pixel space Ω, bilinear interpolation, and a group cardinality of n = 17. [...] These models are trained with the Adam W optimizer (Loshchilov & Hutter, 2017) using default parameters. We minimised the negative log-likelihood using ground-truth image labels. We used a learning rate of 5e 3, 3 epochs for MNIST, 5 epochs for Fashion-MNIST, 10 epochs for GTSRB and mini-batches of size 128.