Equivariant Transformer Networks

Authors: Kai Sheng Tai, Peter Bailis, Gregory Valiant

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of ETs using both synthetic and real-world image classification tasks. Empirically, ET layers improve the sample efficiency of image classifiers relative to standard Spatial Transformer layers (Jaderberg et al., 2015). In particular, we demonstrate that ET layers improve the sample efficiency of modern Res Net classifiers on the Street View House Numbers dataset, with relative improvements in error rate of up to 15% in the limited data regime.
Researcher Affiliation Academia Kai Sheng Tai 1 Peter Bailis 1 Gregory Valiant 1 1Stanford University, Stanford, CA, USA. Correspondence to: Kai Sheng Tai <kst@cs.stanford.edu>.
Pseudocode Yes Algorithm 1 Constructing a canonical coordinate system Input: Transformation group {Tθ} Output: Canonical coordinates ρ(x) vi(x) ( (Tθx)i/ θ)|θ=0, i = 1, 2 Dx (v1(x) / x1 + v2(x) / x2) ρ1(x) a solution of Dxρ1(x) = 1 ρ2(x) a solution of Dxρ2(x) = 0 Return ρ(x) = (ρ1(x), ρ2(x))
Open Source Code Yes Our Py Torch implementation is available at github.com/stanford-futuredata/ equivariant-transformers.
Open Datasets Yes We evaluate ETs on two image classification datasets: an MNIST variant where the digits are distorted under random projective transformations ( 5.1), and the real-world Street View House Numbers (SVHN) dataset ( 5.2).
Dataset Splits Yes The training set consists of 73,257 examples; we use a randomly-chosen subset of 5,000 examples for validation and use the remaining 68,257 examples for training.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper states "Our Py Torch implementation is available at github.com/stanford-futuredata/ equivariant-transformers." but does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup No The paper mentions some architectural details like "3-layer CNNs with 32 channels in each layer" for pose predictors and refers to "dropout rate, and learning rate schedule" but defers the specific values for these parameters to the Appendix, stating "(see the Appendix for details)". Therefore, the main text lacks concrete hyperparameter values.