reproducibilityindex.ai

Understanding the Transferability of Representations via Task-Relatedness

Authors: Akshay Mehra, Yunbei Zhang, Jihun Hamm

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments using state-of-the-art pre-trained models show the effectiveness of task-relatedness in explaining transferability on various vision and language tasks. The efficient computability of task-relatedness even without labels of the target task and its high correlation with the model s accuracy after end-to-end fine-tuning on the target task makes it a useful metric for transferability estimation. Our empirical results of using task-relatedness on the problem of selecting the best pre-trained model from a model zoo for a target task highlight its utility for practical problems.
Researcher Affiliation	Academia	Akshay Mehra, Yunbei Zhang, and Jihun Hamm Tulane University {amehra, yzhang111, jhamm3}@tulane.edu
Pseudocode	Yes	Alg. 1 shows how we solve Eq. 4 (see App. D for additional details of the algorithm). Algorithm 1 Minimization of the bound in Theorem 3 Input: Reference task samples and labels (ZR, YR), Target task samples (ZT ), Target task labels (YT ) (optional). Output: Estimate of task-relatedness using the learned transformations A, A, B, D.
Open Source Code	Yes	Our codes can be found at https://github.com/akshaymehra24/Task Transfer Analysis.
Open Datasets	Yes	Aircraft [35]: consists of 10,000 aircraft images belonging to 100 classes. CIFAR-10/100 [30]: These datasets contain 60,000 images belonging to 10/100 categories. DTD[13]: consists of 5,640 textural images belonging to 47 categories. Fashion MNIST [60]: consists of 70,000 grayscale images belonging to 10 categories. Pets [43]: consists of 7049 images of Cats and Dogs spread across 37 categories. Image Net [17]: consists of 1.1 million images belonging to 1000 categories. Yelp [63]: consists of 650,000 training and 50,000 test examples belonging to 5 classes. Stanford Sentiment Treebank (SST-5) [63]: consists of 8,544 training and 2,210 test samples belonging to 5 classes. AG News [63]: consists of 120,000 training and 7,600 test examples belonging to 4 classes DBPedia [63]: consists of 560,000 training and 70,000 test examples belonging to 14 classes
Dataset Splits	Yes	Yelp [63]: consists of 650,000 training and 50,000 test examples belonging to 5 classes. For evaluation, we use a similar subsample of the validation dataset of Image Net containing all the samples belonging to the subsampled classes.
Hardware Specification	Yes	All codes are written in Python using Tensorflow/Pytorch and were run on an Intel(R) Xeon(R) Platinum 8358 CPU with 200 GB of RAM and an Nvidia A10 GPU.
Software Dependencies	No	The paper mentions using "Python", "Tensorflow", and "Pytorch" but does not specify their version numbers or the versions of any other key libraries or dependencies required for reproduction.
Experiment Setup	Yes	Implementation and hyperparameters are described below. All codes are written in Python using Tensorflow/Pytorch... We use a batch size of 1000 for Res Net18 models (representation dimension 512) and 2500 for Res Net50 models (representation dimension 2048). Fine-tuning runs for a total of 5000 epochs. Linear classifiers are trained with a gradient norm penalty (with τ = 0.02).