LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views
Authors: Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Broad experiments with large language and vision models show that LEVI greatly improves fine-tuning generalization via emphasizing different views from fine-tuning data and pre-trained features. |
| Researcher Affiliation | Collaboration | 1Google Inc, Mountain View, CA, USA 2Google Deep Mind, Mountain View, CA, USA 3School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. |
| Pseudocode | No | The paper describes the framework in Section 4 and illustrates it with Figure 3, but it does not include explicit 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper states: 'We use Tensor Flow (Abadi et al., 2015) with JAX (Bradbury et al., 2018) and Flax (Heek et al., 2023).' and provides links to these third-party libraries, but there is no explicit statement about releasing the code for the methodology described in this paper. |
| Open Datasets | Yes | For languagebased recommendation, we use Movie Lens (Harper & Konstan, 2015) and Amazon Review (Ni et al., 2019) ... For vision, we use Diabetic Retinopathy (Medical) (Emma Dugas, 2015) and Image Net Variants (Wang et al., 2019; Hendrycks et al., 2021a;b; Recht et al., 2019). All datasets are from Tensor Flow Datasets (TFD). ... Tensor Flow Datasets, a collection of ready-to-use datasets. https://www.tensorflow.org/datasets. |
| Dataset Splits | Yes | Settings. We consider three data distributions each for pretraining, fine-tuning, and testing. Pre-training data and test data are not available, and we only have fine-tuning data and pre-trained model features. When we refer to training data, we mean the fine-tuning data. We consider the fine-tuning distribution as in-distribution (ID), and the test distribution as out-of-distribution (OOD). Thus, ID data represents the samples that the model has been trained, and OOD data represents unfamiliar samples not seen during training. For each algorithm, we choose the best hyperparameters from the above candidate sets to achieve the best performance in the in-distribution validation set, while not accessing the out-of-distribution datasets. |
| Hardware Specification | Yes | In all experiments, we use Dragonfish TPU (i.e., TPUv3) and Jellyfish TPU (i.e., TPUv2) with 2x2 topology for T5x and Vi T experiments, respectively. |
| Software Dependencies | No | The paper states 'we use Tensor Flow (Abadi et al., 2015) with JAX (Bradbury et al., 2018) and Flax (Heek et al., 2023).' However, it does not provide specific version numbers for these software dependencies, only citations to their original papers. |
| Experiment Setup | Yes | Here are common hyperparameters and settings for all algorithms. We use the Adam optimizer and SGD optimizer for T5x and Vi T experiments, respectively. For batch sizes, we use 200 for Movie Lens, 100 for Amazon Review, and 512 for all computer vision datasets. For learning rates, we consider a set {0.0001, 0.001, 0.01, 0.1} for all algorithms except linear probing. In linear probing, we use a learning rate set with larger values {0.001, 0.01, 0.1, 1.0, 10.0}. |