Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

Authors: Fangzhao Zhang, Mert Pilanci

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results with both large language models and text-to-image diffusion models show that with this new preconditioner, the convergence and reliability of SGD and Adam W can be significantly enhanced.
Researcher Affiliation Academia Department of Electrical Engineering, Stanford University.
Pseudocode Yes We outline the pseudocode of our scaled Adam W in Algorithm 1.
Open Source Code Yes Code can be accessed at https://github.com/pilancilab/ Riemannian_Preconditioned_Lo RA.
Open Datasets Yes We exploit the new preconditioner for Lo RA fine-tuning of GPT-2 models... E2E (Novikova et al., 2017) natural language generation challenge. We experiment with the Mix-of-Show model (Gu et al., 2023) which can generate high-quality face images.
Dataset Splits No The paper uses standard benchmarks like E2E and GLUE, which typically have predefined splits. However, it does not explicitly state the specific training, validation, or test dataset splits (e.g., percentages or counts) used for reproduction.
Hardware Specification Yes Figure 2 shows the runtime used for different optimizers for the fine-tuning task trained on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software like Py Torch and Hugging Face transformers, but does not provide specific version numbers for these or other dependencies.
Experiment Setup Yes Table 3 lists hyperparameters for GPT-2 model fine-tuning including Weight decay 0.01, Dropout Prob 0.1, Batch Size 8, # Epoch 5, Warmup Steps 500, LR Scheduler Linear, Label Smooth 0.1, tuned Learning Rates, Adam W β1/β2, and Lo RA α 32.