Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning
Authors: Arian Raje, Baris Askin, Divyansh Jhunjhunwala, Gauri Joshi
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on vision and language benchmarks show that RAVAN improves test accuracy by 2 8% over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs. ... 4 Experiments |
| Researcher Affiliation | Academia | Arian Raje Baris Askin Divyansh Jhunjhunwala Gauri Joshi Department of Electrical and Computer Engineering Carnegie Mellon University Corresponding Author. EMAIL |
| Pseudocode | Yes | The pseudocode of the proposed method RAVAN is given in Algorithm 1, and the following sections highlight key components of our framework. ... Algorithm 1 RAVAN |
| Open Source Code | Yes | Our code base is provided in the supplementary material zip file with a README that includes instructions on run commands for our method and all baselines. |
| Open Datasets | Yes | For image classification, we adopt Vi T-B/16 [11] (85 M parameters) and fine-tune on two benchmarks: (i) CIFAR-100 (50,000 train / 10,000 test images, 100 classes) and (ii) SVHN (73,250 train / 26,032 test digits, 10 classes). For natural-language tasks, we fine-tune T5-Base [31] (224 M parameters) on (i) 20 Newsgroups [28] (11,300 train / 7,532 test articles, 20 topics) and (ii) MRQA [14] (516,800 train / 58,221 test examples). ... Scaling to Larger Model Architectures. We demonstrate the scalability of RAVAN for larger model architectures by benchmarking the method against prior baselines on the GLUE benchmark [37] using LLa MA3.2-1B [13] |
| Dataset Splits | Yes | For image classification, we adopt Vi T-B/16 [11] (85 M parameters) and fine-tune on two benchmarks: (i) CIFAR-100 (50,000 train / 10,000 test images, 100 classes) and (ii) SVHN (73,250 train / 26,032 test digits, 10 classes). For natural-language tasks, we fine-tune T5-Base [31] (224 M parameters) on (i) 20 Newsgroups [28] (11,300 train / 7,532 test articles, 20 topics) and (ii) MRQA [14] (516,800 train / 58,221 test examples). ... For I.I.D. partitions, clients receive an equal-sized random subsample of the global training set. For non-I.I.D. partitions, we draw client-specific class proportions from a Dirichlet distribution with α=0.3. For MRQA, which lacks class labels, the Dirichlet split is performed over the six constituent sub-datasets. |
| Hardware Specification | Yes | All experiments were executed on a GPU cluster managed by SLURM. Each training job used a single NVIDIA V100 32GB GPU with 256 GB RAM. |
| Software Dependencies | Yes | Our environment used Pytorch 2.5.1 and Huggingface 4.47.1 for all experiments. |
| Experiment Setup | Yes | Every selected client performs 50 local training iterations before uploading its update. Note, we intentionally train for 50 mini-batches and not 50 entire traversals of the client s training dataset so that each client performs exactly the same number of forward-backward passes. ... Table 7: FL hyperparameter settings used for each model dataset pair. ... For RAVAN and each baseline, we run a learning rate hyperparameter sweep across the values {5e 5, 1e 5, 5e 4, 1e 4, 5e 3, 1e 3, 5e 2, 1e 2, 5e 2} and choose the most performant learning to represent in our results. Table 8 represents the optimal choices for each baseline in all settings. The following results each use the ADAM optimizer with momentum set to 0.9. |