Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Authors: Xi Chen, Kaituo Feng, Changsheng Li, Xunhao Lai, Xiangyu Yue, Ye Yuan, Guoren Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the pre-training and fine-tuning of LLMs show that Fira outperforms both Lo RA and Ga Lore. Notably, for pre-training LLa MA 7B, our Fira uses 8 smaller memory of optimizer states than Galore, yet outperforms it by a large margin. |
| Researcher Affiliation | Academia | 1Beijing Institute of Technology 2School of Intelligence Science and Technology, Peking University 3MMLab, The Chinese University of Hong Kong 4Hebei Province Key Laboratory of Big Data Science and Intelligent Technology |
| Pseudocode | Yes | A Fira Implementation A.1 Algorithm Pseudocode Algorithm 1 Fira with Adam ... A.2 Plug-and-play Framework for Fira Algorithm 2 Plug-and-play framework for Fira, Pytorch-like. |
| Open Source Code | Yes | We will release the source code and package of our Fira into a Python library for easy use. https://github.com/xichen-fy/Fira |
| Open Datasets | Yes | Adam optimizer is used to train all baselines and our method on the C4 dataset in the BF16 format. The dataset C4 is a colossal, cleaned version of Common Crawl s web crawl corpus, which is widely used in LLM pre-training [26]. |
| Dataset Splits | No | Following [14], we perform the fine-tuning task to compare Fira with Lo RA... This task consists of eight sub-tasks, each with its own designated training and testing sets. Following the approach of [14], we combine the training datasets from all eight sub-tasks into a unified training set, while evaluating each sub-task individually using its respective testing dataset. |
| Hardware Specification | Yes | We use 8 A100 80G GPUs to conduct pre-training experiments. ... We adopt RTX 4090 GPUs for fine-tuning experiments. |
| Software Dependencies | No | Adam optimizer is used to train all baselines and our method on the C4 dataset in the BF16 format. |
| Experiment Setup | Yes | The detailed settings of pre-training are provided in Appendix B.1. The detailed settings of fine-tuning are provided in Appendix B.2. |