When is Transfer Learning Possible?
Authors: My Phan, Kianté Brantley, Stephanie Milani, Soroush Mehri, Gokul Swamy, Geoffrey J. Gordon
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present case studies to illustrate how our work may be used to analyze and even challenge widely-held beliefs about transfer learning. We show that sparse mechanism shifts are neither necessary nor sufficient for transfer, and we show that freezing a layer of a network may either succeed or fail at transfer (Section 5). Experiment. Fig. 5 compares transfer performance of several methods: |
| Researcher Affiliation | Collaboration | 1Cornell University 2Carnegie Mellon University 3Elementera AI. |
| Pseudocode | Yes | Algorithm 1 Meta-Algorithm for Transfer |
| Open Source Code | No | The paper does not provide an explicit statement or a direct link to the open-source code for the methodology described in the paper. It mentions using Ray Tune and references a GitHub link for a dataset, but not for their own implementation. |
| Open Datasets | Yes | Dataset and the pre-processing code to generate multiple environments of colored images from the original MNIST dataset from Gulrajani & Lopez-Paz (2020) were used with image size s = 28 and number of color channels k = 2. |
| Dataset Splits | Yes | We divide Environments 1 to train, val and test set with ratio 0.8, 0.1, 0.1. |
| Hardware Specification | No | The experiments that produce the figures in this paper were performed on a personal laptop. This description is too general and does not provide specific details such as GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions using "Ray Tune" but does not specify its version number or any other key software dependencies with their versions, which are necessary for reproducibility. |
| Experiment Setup | Yes | We use Ray Tune to tune the hyperparameters. We sample the parameters from the configuration range in the table below, run the training process until the number of epochs is 100 or the standard deviation of the validation loss of the last 10 epoch is at most 0.01. Train in Env. 1 Learning rate loguniform(0.01, 5) Momentum 0.9 Weight decay 0.0001 Train in Env.3 (No Transfer) loguniform(0.00001, 0.01) Transfer Coefficient 0.0001 Transfer Layer, Transfer Random Coefficient Learning rate loguniform(0.00001, 0.01) np 20 20 or 50 uniformly Batch size 512 Number of sampled hyperparameters |