Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data
Authors: Cheng-Hao Tu, Hong-You Chen, Zheda Mai, Jike Zhong, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun (Harry) Chao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we construct benchmark datasets and conduct extensive experiments to uncover the inherent challenges. |
| Researcher Affiliation | Academia | 1The Ohio State University, 2University of Wisconsin-Madison, 3Rensselaer Polytechnic Institute |
| Pseudocode | No | The paper describes methods like LOLSGD but does not present them in a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement offering open-source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Office-Home [74]: domain adaptation while some classes are missing in the target training set. FEMNIST [4]: personalized hand-written alphanumeric recognition with writers styles. [...] i Wild Cam [38]: species recognition across camera traps of many geo-locations. [...] VTAB [91]: fine-tuning zero-shot models for diverse vision tasks with partial classes each. i Naturalist (2021 version, Fungi) [73]: classification of visually-similar poisonous fungi. |
| Dataset Splits | No | The paper primarily describes train/test splits (e.g., 'randomly split the data of each class into training and test sets with a ratio of 7:3') but does not explicitly mention or detail a distinct validation dataset split. |
| Hardware Specification | Yes | We conduct our experiments on Py Torch and on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with specific version numbers. |
| Experiment Setup | Yes | All methods use the cross-entropy loss for L and SGD momentum optimizer. All experiments fine-tune the source model for 20 epochs (10 for FEMNIST) by default. For the regularizers in section 3, we attach them as L + λdsitill (or rank)Ldsitill (or rank), where the weights λ are quite stable thus we did not search for it exhaustedly for every method, but use the same ones per dataset. For the proposed LOLSGD, we set M = 10 and randomly drop 3 classes when sampling Ym T in Equation 5. Each subgradient in LOLSGD is by local SGD ( 1 M epoch) and we run the same total epochs, for a fair computation budget. Please see the supplementary for the details of setup, hyperparameters, and more analyses. |