Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers
Authors: Johann Schmidt, Sebastian Stober
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our method on several benchmark datasets, including a synthesised Image Net testset. ITSoutperformsthe utilised baselines on all zero-shot test scenarios. |
| Researcher Affiliation | Academia | 1Artificial Intelligence Lab, Otto-von-Guericke University, Magdeburg, Germany. Correspondence to: Johann Schmidt <EMAIL>. |
| Pseudocode | Yes | Supplementary to the descriptions and illustrations in Figure 4 and Figure 5, we provide the pseudocode of our proposed algorithm in Algorithm 1. |
| Open Source Code | Yes | More details can be found in our publicly available source code.2 2www.github.com/joh Schm/ITS |
| Open Datasets | Yes | We evaluated our method on several benchmark datasets, including a synthesised Image Net testset. [...] We trained a CNN, a GCNN (Cohen & Welling, 2016) and a Rot DCF (Cheng et al., 2018) on the vanilla (canonical) MNIST. [...] Si-Score (short SI) (Djolonga et al., 2021) is a synthetic vision dataset for robustness testing, comprising semantically masked Image Net (Russakovsky et al., 2015) objects |
| Dataset Splits | Yes | We split the vanilla datasets into disjunct training, validation, and test sets. We always employ the vanilla training set to fit the model and validate it on the vanilla validation set. |
| Hardware Specification | Yes | All experiments are performed on an Nvidia A40 GPU (48GB) node with 1 TB RAM, 2x 24core AMD EPYC 74F3 CPU @ 3.20GHz, and a local SSD (NVMe). |
| Software Dependencies | No | The software specifications of our implementations can be found in our open-sourced code. |
| Experiment Setup | Yes | If not further specified, we used zero-padding to define areas outside the pixel space Ω, bilinear interpolation, and a group cardinality of n = 17. [...] These models are trained with the Adam W optimizer (Loshchilov & Hutter, 2017) using default parameters. We minimised the negative log-likelihood using ground-truth image labels. We used a learning rate of 5e 3, 3 epochs for MNIST, 5 epochs for Fashion-MNIST, 10 epochs for GTSRB and mini-batches of size 128. |