CrossTransformers: spatially-aware few-shot transfer

Authors: Carl Doersch, Ankush Gupta, Andrew Zisserman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate on Meta-Dataset [86], specifically the setting where the training is performed on the Image Net train split only... Table 1: Effects of architecture and Sim CLR Episodes on Prototypical Nets, for Meta-Dataset Train-on-ILSVRC. We ablate architectural choices... Table 2: Cross Transformers (CTX) comparison to state-of-the-art.
Researcher Affiliation Collaboration Deep Mind, London VGG, Department of Engineering Science, University of Oxford
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes Code available at: https://github.com/google-research/ meta-dataset.
Open Datasets Yes We evaluate on Meta-Dataset [86], specifically the setting where the training is performed on the Image Net train split only
Dataset Splits Yes We evaluate on Meta-Dataset [86], specifically the setting where the training is performed on the Image Net train split only, which is 712 classes (plus 158 classes for validation, which are not used for training but only to perform early stopping).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) were mentioned for running experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names like PyTorch with a version) were mentioned.
Experiment Setup Yes To ensure comparability, we followed the public implementation of Prototypical Nets for Meta Dataset [86] wherever possible. This includes using the same hyperparameters, unless otherwise noted. For the hyperparameters that were chosen with a sweep on the validation set (learning rate schedule and weight decay), we simply used the best values discovered for Prototypical Nets for all the experiments in this paper... our experiments increase resolution to the standard 224 224 and use Res Net-34, and we also use normalized stochastic gradient descent [19, 59]... For experiments with Cross Transformers, we also increased the resolution of the convolutional feature map by setting the stride of final block of the Res Net to 1, and using dilated convolutions to preserve the feature computation [33, 40].