Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
When is Transfer Learning Possible?
Authors: My Phan, Kiantรฉ Brantley, Stephanie Milani, Soroush Mehri, Gokul Swamy, Geoffrey J. Gordon
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present case studies to illustrate how our work may be used to analyze and even challenge widely-held beliefs about transfer learning. We show that sparse mechanism shifts are neither necessary nor sufficient for transfer, and we show that freezing a layer of a network may either succeed or fail at transfer (Section 5). Experiment. Fig. 5 compares transfer performance of several methods: |
| Researcher Affiliation | Collaboration | 1Cornell University 2Carnegie Mellon University 3Elementera AI. |
| Pseudocode | Yes | Algorithm 1 Meta-Algorithm for Transfer |
| Open Source Code | No | The paper does not provide an explicit statement or a direct link to the open-source code for the methodology described in the paper. It mentions using Ray Tune and references a GitHub link for a dataset, but not for their own implementation. |
| Open Datasets | Yes | Dataset and the pre-processing code to generate multiple environments of colored images from the original MNIST dataset from Gulrajani & Lopez-Paz (2020) were used with image size s = 28 and number of color channels k = 2. |
| Dataset Splits | Yes | We divide Environments 1 to train, val and test set with ratio 0.8, 0.1, 0.1. |
| Hardware Specification | No | The experiments that produce the figures in this paper were performed on a personal laptop. This description is too general and does not provide specific details such as GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions using "Ray Tune" but does not specify its version number or any other key software dependencies with their versions, which are necessary for reproducibility. |
| Experiment Setup | Yes | We use Ray Tune to tune the hyperparameters. We sample the parameters from the configuration range in the table below, run the training process until the number of epochs is 100 or the standard deviation of the validation loss of the last 10 epoch is at most 0.01. Train in Env. 1 Learning rate loguniform(0.01, 5) Momentum 0.9 Weight decay 0.0001 Train in Env.3 (No Transfer) loguniform(0.00001, 0.01) Transfer Coefficient 0.0001 Transfer Layer, Transfer Random Coefficient Learning rate loguniform(0.00001, 0.01) np 20 20 or 50 uniformly Batch size 512 Number of sampled hyperparameters |