When is Transfer Learning Possible?

Authors: My Phan, Kianté Brantley, Stephanie Milani, Soroush Mehri, Gokul Swamy, Geoffrey J. Gordon

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present case studies to illustrate how our work may be used to analyze and even challenge widely-held beliefs about transfer learning. We show that sparse mechanism shifts are neither necessary nor sufficient for transfer, and we show that freezing a layer of a network may either succeed or fail at transfer (Section 5). Experiment. Fig. 5 compares transfer performance of several methods:
Researcher Affiliation Collaboration 1Cornell University 2Carnegie Mellon University 3Elementera AI.
Pseudocode Yes Algorithm 1 Meta-Algorithm for Transfer
Open Source Code No The paper does not provide an explicit statement or a direct link to the open-source code for the methodology described in the paper. It mentions using Ray Tune and references a GitHub link for a dataset, but not for their own implementation.
Open Datasets Yes Dataset and the pre-processing code to generate multiple environments of colored images from the original MNIST dataset from Gulrajani & Lopez-Paz (2020) were used with image size s = 28 and number of color channels k = 2.
Dataset Splits Yes We divide Environments 1 to train, val and test set with ratio 0.8, 0.1, 0.1.
Hardware Specification No The experiments that produce the figures in this paper were performed on a personal laptop. This description is too general and does not provide specific details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions using "Ray Tune" but does not specify its version number or any other key software dependencies with their versions, which are necessary for reproducibility.
Experiment Setup Yes We use Ray Tune to tune the hyperparameters. We sample the parameters from the configuration range in the table below, run the training process until the number of epochs is 100 or the standard deviation of the validation loss of the last 10 epoch is at most 0.01. Train in Env. 1 Learning rate loguniform(0.01, 5) Momentum 0.9 Weight decay 0.0001 Train in Env.3 (No Transfer) loguniform(0.00001, 0.01) Transfer Coefficient 0.0001 Transfer Layer, Transfer Random Coefficient Learning rate loguniform(0.00001, 0.01) np 20 20 or 50 uniformly Batch size 512 Number of sampled hyperparameters