reproducibilityindex.ai

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

Authors: Fangzhao Zhang, Mert Pilanci

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results with both large language models and text-to-image diffusion models show that with this new preconditioner, the convergence and reliability of SGD and Adam W can be significantly enhanced.
Researcher Affiliation	Academia	Department of Electrical Engineering, Stanford University.
Pseudocode	Yes	We outline the pseudocode of our scaled Adam W in Algorithm 1.
Open Source Code	Yes	Code can be accessed at https://github.com/pilancilab/ Riemannian_Preconditioned_Lo RA.
Open Datasets	Yes	We exploit the new preconditioner for Lo RA fine-tuning of GPT-2 models... E2E (Novikova et al., 2017) natural language generation challenge. We experiment with the Mix-of-Show model (Gu et al., 2023) which can generate high-quality face images.
Dataset Splits	No	The paper uses standard benchmarks like E2E and GLUE, which typically have predefined splits. However, it does not explicitly state the specific training, validation, or test dataset splits (e.g., percentages or counts) used for reproduction.
Hardware Specification	Yes	Figure 2 shows the runtime used for different optimizers for the fine-tuning task trained on NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions software like Py Torch and Hugging Face transformers, but does not provide specific version numbers for these or other dependencies.
Experiment Setup	Yes	Table 3 lists hyperparameters for GPT-2 model fine-tuning including Weight decay 0.01, Dropout Prob 0.1, Batch Size 8, # Epoch 5, Warmup Steps 500, LR Scheduler Linear, Label Smooth 0.1, tuned Learning Rates, Adam W β1/β2, and Lo RA α 32.