Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data

Authors: Cheng-Hao Tu, Hong-You Chen, Zheda Mai, Jike Zhong, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun (Harry) Chao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we construct benchmark datasets and conduct extensive experiments to uncover the inherent challenges.
Researcher Affiliation Academia 1The Ohio State University, 2University of Wisconsin-Madison, 3Rensselaer Polytechnic Institute
Pseudocode No The paper describes methods like LOLSGD but does not present them in a formal pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement offering open-source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets Yes Office-Home [74]: domain adaptation while some classes are missing in the target training set. FEMNIST [4]: personalized hand-written alphanumeric recognition with writers styles. [...] i Wild Cam [38]: species recognition across camera traps of many geo-locations. [...] VTAB [91]: fine-tuning zero-shot models for diverse vision tasks with partial classes each. i Naturalist (2021 version, Fungi) [73]: classification of visually-similar poisonous fungi.
Dataset Splits No The paper primarily describes train/test splits (e.g., 'randomly split the data of each class into training and test sets with a ratio of 7:3') but does not explicitly mention or detail a distinct validation dataset split.
Hardware Specification Yes We conduct our experiments on Py Torch and on NVIDIA V100 GPUs.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with specific version numbers.
Experiment Setup Yes All methods use the cross-entropy loss for L and SGD momentum optimizer. All experiments fine-tune the source model for 20 epochs (10 for FEMNIST) by default. For the regularizers in section 3, we attach them as L + λdsitill (or rank)Ldsitill (or rank), where the weights λ are quite stable thus we did not search for it exhaustedly for every method, but use the same ones per dataset. For the proposed LOLSGD, we set M = 10 and randomly drop 3 classes when sampling Ym T in Equation 5. Each subgradient in LOLSGD is by local SGD ( 1 M epoch) and we run the same total epochs, for a fair computation budget. Please see the supplementary for the details of setup, hyperparameters, and more analyses.