reproducibilityindex.ai

Spatial Transformer Networks

Authors: Max Jaderberg, Karen Simonyan, Andrew Zisserman, koray kavukcuoglu

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we explore the use of spatial transformer networks on a number of supervised learning tasks. In Sect. 4.1 we begin with experiments on distorted versions of the MNIST handwriting dataset, showing the ability of spatial transformers to improve classiﬁcation performance through actively transforming the input images. In Sect. 4.2 we test spatial transformer networks on a challenging real-world dataset, Street View House Numbers [21], for number recognition, showing stateof-the-art results using multiple spatial transformers embedded in the convolutional stack of a CNN. Finally, in Sect. 4.3, we investigate the use of multiple parallel spatial transformers for ﬁne-grained classiﬁcation, showing state-of-the-art performance on CUB-200-2011 birds dataset [28] by automatically discovering object parts and learning to attend to them.
Researcher Affiliation	Industry	Google Deep Mind, London, UK {jaderberg,simonyan,zisserman,korayk}@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	We begin with experiments where we train different neural network models to classify MNIST data that has been distorted in various ways: rotation (R); rotation, scale and translation (RTS); projective transformation (P); elastic warping (E)..., We now test our spatial transformer networks on a challenging real-world dataset, Street View House Numbers (SVHN) [21]., We evaluate our models on the CUB-200-2011 birds dataset [28], containing 6k training images and 5.8k test images, covering 200 species of birds.
Dataset Splits	No	The paper mentions training and testing sets for some datasets (e.g., CUB-200-2011 with 6k training images and 5.8k test images) but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact counts for all three) for all experiments or reference pre-defined splits with full details within the provided text.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	All networks have approximately the same number of parameters, are trained with identical optimisation schemes (backpropagation, SGD, scheduled learning rate decrease, with a multinomial cross entropy loss), and all with three weight layers in the classiﬁcation network. All networks are trained from scratch with SGD and dropout [14], with randomly initialised weights, except for the regression layers of spatial transformers which are initialised to predict the identity transform. Afﬁne transformations and bilinear sampling kernels are used for all spatial transformer networks in these experiments.