Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models
Authors: Fangzhao Zhang, Mert Pilanci
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results with both large language models and text-to-image diffusion models show that with this new preconditioner, the convergence and reliability of SGD and Adam W can be significantly enhanced. |
| Researcher Affiliation | Academia | Department of Electrical Engineering, Stanford University. |
| Pseudocode | Yes | We outline the pseudocode of our scaled Adam W in Algorithm 1. |
| Open Source Code | Yes | Code can be accessed at https://github.com/pilancilab/ Riemannian_Preconditioned_Lo RA. |
| Open Datasets | Yes | We exploit the new preconditioner for Lo RA fine-tuning of GPT-2 models... E2E (Novikova et al., 2017) natural language generation challenge. We experiment with the Mix-of-Show model (Gu et al., 2023) which can generate high-quality face images. |
| Dataset Splits | No | The paper uses standard benchmarks like E2E and GLUE, which typically have predefined splits. However, it does not explicitly state the specific training, validation, or test dataset splits (e.g., percentages or counts) used for reproduction. |
| Hardware Specification | Yes | Figure 2 shows the runtime used for different optimizers for the fine-tuning task trained on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software like Py Torch and Hugging Face transformers, but does not provide specific version numbers for these or other dependencies. |
| Experiment Setup | Yes | Table 3 lists hyperparameters for GPT-2 model fine-tuning including Weight decay 0.01, Dropout Prob 0.1, Batch Size 8, # Epoch 5, Warmup Steps 500, LR Scheduler Linear, Label Smooth 0.1, tuned Learning Rates, Adam W β1/β2, and Lo RA α 32. |