Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Understanding the Role of Invariance in Transfer Learning

Authors: Till Speicher, Vedant Nanda, Krishna P. Gummadi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we systematically investigate the importance of representational invariance for transfer learning, as well as how it interacts with other parameters during pretraining. To do so, we introduce a family of synthetic datasets that allow us to precisely control factors of variation both in training and test data. Using these datasets, we a) show that for learning representations with high transfer performance, invariance to the right transformations is as, or often more, important than most other factors such as the number of training samples, the model architecture and the identity of the pretraining classes, b) show conditions under which invariance can harm the ability to transfer representations and c) explore how transferable invariance is between tasks.
Researcher Affiliation	Academia	Till Speicher EMAIL MPI-SWS Vedant Nanda EMAIL MPI-SWS Krishna P. Gummadi EMAIL MPI-SWS
Pseudocode	No	The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes methodologies in narrative text and mathematical formulas.
Open Source Code	Yes	The code is available at https://github.com/tillspeicher/representation-invariance-transfer.
Open Datasets	Yes	Therefore, to study invariance carefully, we introduce a family of synthetic datasets, Transforms-2D, that allows us to precisely control the differences and similarities between inputs in a model s training and test sets. [...] Foreground and background images are based on the SI-Score dataset Djolonga et al. (2021) [...] The dataset is available under the following URL: https://github.com/google-research/si-score. It uses the Apache 2.0 license. [...] We subsample images to have a resolution of 32 x 32 pixels, since this size provides enough detail for models to be able to distinguish between different objects, even with transformations applied to them, while allowing for faster iteration times in terms of training and evaluation. [...] We also use the CIFAR-10 and CIFAR-100 datasets Krizhevsky et al. (2009).
Dataset Splits	Yes	In our experiments we use 50,000 training samples to train models with specific invariances, as well as 10,000 validation and 10,000 test samples, if not stated differently. These numbers mimic the size of the CIFAR datasets (Krizhevsky et al., 2009).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies	No	All models are implemented in Py Torch (Paszke et al., 2019). For all CNN models we use implementations adapted for CIFAR datasets available here https://github.com/kuangliu/pytorch-cifar. For Vi Ts, we use an implementation from this repository https://github.com/omihub777/Vi T-CIFAR (achieving more than 80% accuracy on CIFAR-10) with a patch size of 8, 7 layers, 384 hidden and MLP units, 8 heads and no Dropout. We train models using Pytorch Lightning (Falcon & The Py Torch Lightning team, 2019) on the Transforms2D dataset for 50 epochs and fine-tune their output layer (while keeping the rest of the network frozen) for 200 epochs.
Experiment Setup	Yes	We train models using Pytorch Lightning (Falcon & The Py Torch Lightning team, 2019) on the Transforms2D dataset for 50 epochs and fine-tune their output layer (while keeping the rest of the network frozen) for 200 epochs. [...] All CNN models are trained and fine-tuned using the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001. For Vi Ts, we use cosine learning rate scheduler (Loshchilov & Hutter, 2016) with a learning rate that is decayed from 0.001 to 0.00001 over the duration of training, and a weight decay of 0.00001, with 5 warmup epochs. We keep the checkpoint that achieves the highest validation accuracy during training and fine-tuning.