ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

Authors: Massimo Bini, Karsten Roth, Zeynep Akata, Anna Khoreva

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters ( 10-100 times lower than Lo RA or OFT) across multiple image synthesis and natural language tasks without exhaustive hyperparameter tuning.
Researcher Affiliation Collaboration 1Bosch Io C Lab, University of T ubingen 2Helmholtz Munich 3T ubingen AI Center, University of T ubingen 4Technical University of Munich 5Munich Center for Machine Learning 6Bosch Center for Artificial Intelligence.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes The code is available at https://github.com/mwbini/ether.
Open Datasets Yes For our experiments on diffusion-based generative models, we apply the finetuning methods on the pretrained Stable Diffusion-v1.5 (Rombach et al., 2022), following the setting from OFT (Qiu et al., 2023). Our experiments follow best practices and hyperparameter choices for each method. For implementation details, please refer to App. C.
Dataset Splits Yes Evaluations are performed on 2000 images generated from the validation set using mean Intersection-over-Union (m Io U) and accuracy of semantic maps over generated images using Uper Net-101 (Xiao et al., 2018) pretrained on ADE20K.
Hardware Specification Yes We perform the training on a Tesla V100-32GB GPU. (...) We perform all the training runs on a single Nvidia-A100-40GB with a batch size of 10. (...) All training runs are conducted on a single Nvidia-A100-40GB GPU, but could also be run on a consumer NVIDIA Ge Force-RTX-3090-24G GPU.
Software Dependencies No The paper mentions software components like 'peft Hugging Face repository' and 'lit-gpt repository' but does not specify their version numbers or the versions of other relevant libraries or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes For Dream Booth and OFT, we follow the original implementations and use a learning rate of 5 10 6 and 6 10 5 respectively, with a batch size of 1. For Naive the non-orthogonal OFT variant we use the same setting of OFT for a fair comparison. For Lo RA we select a learning rate of 6 10 4. For ETHER and ETHER+, we use a learning rate of 6 10 3. (...) for OFT and Naive we use a learning rate of 1 10 5. For ETHER and ETHER+ we use a larger learning rate of 1 10 3. For all experiments, we upper bound the learning rate of the signal encoder to 1 10 4. (...) The relevant hyperparameters for each task are reported in Tab. 8. All training runs are conducted on a single Nvidia-A100-40GB GPU. (Table 7 lists Learning Rate, Batch Size, Num. Epochs, Dropout, Max Seq. Len for GLUE tasks).