Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Uni-LoRA: One Vector is All You Need

Authors: Kaiyang Li, Shaobo Han, Qing Su, Wei Li, Zhipeng Cai, Shihao Ji

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on GLUE, mathematical reasoning, and instruction tuning benchmarks demonstrate that Uni-Lo RA achieves state-of-the-art parameter efficiency while outperforming or matching prior approaches in predictive performance.
Researcher Affiliation	Collaboration	Kaiyang Li School of Computing University of Connecticut Storrs, CT 06269 EMAIL Shaobo Han Optical Networking and Sensing NEC Labs America Princeton, NJ 08540 EMAIL Qing Su School of Computing University of Connecticut Storrs, CT 06269 EMAIL Wei Li Dept. of Computer Science Georgia State University Atlanta, GA 30303 EMAIL Zhipeng Cai Dept. of Computer Science Georgia State University Atlanta, GA 30303 EMAIL Shihao Ji School of Computing University of Connecticut Storrs, CT 06269 EMAIL
Pseudocode	Yes	Algorithm 1 Pseudocode of Uni-Lo RA in a Py Torch-like Style
Open Source Code	Yes	Our code is available at https://github.com/Kaiyang Li1992/Uni-Lo RA.
Open Datasets	Yes	Natural Language Understanding. We use Ro BERTa-base and Ro BERTa-large models developed by Facebook AI, released under the MIT License and available at: https://huggingface.co/ roberta-base, https://huggingface.co/roberta-large. We evaluate on the GLUE benchmark, which is publicly available at https://gluebenchmark.com/ and composed of multiple sub-datasets under various open licenses, as documented on the GLUE website. Mathematical Reasoning. We fine-tune the Mistral-7B-v0.1 model, released under the Apache 2.0 License and available at https://huggingface.co/mistralai/Mistral-7B-v0.1, and the Gemma-7B model, which requires agreement to Google s usage license and is available at https://huggingface.co/google/gemma-7b. We use the Meta Math QA dataset, available under the MIT License at https://huggingface.co/datasets/meta-math/Meta Math QA, and evaluate on GSM8K and MATH datasets, both under the MIT License and available at: https: //huggingface.co/datasets/openai/gsm8k, https://github.com/hendrycks/math. Instruction Tuning. We use the Cleaned Alpaca dataset, which improves upon the original Alpaca dataset. Both versions are licensed under CC BY-NC 4.0 and available at: https: //huggingface.co/datasets/tatsu-lab/alpaca, https://huggingface.co/datasets/ yahma/alpaca-cleaned. We evaluate on MT-Bench, released under CC BY 4.0 and available at https://huggingface.co/datasets/lmsys/mt_bench_human_judgments. Fine-tuning is performed on the LLa MA 2 model, licensed under the LLAMA 2 Community License and available at https://huggingface.co/meta-llama, using the QLo RA framework, released under the MIT License and available at https://github.com/artidoro/qlora.
Dataset Splits	Yes	We adopt the General Language Understanding Evaluation (GLUE) benchmark [17] to assess the performance of Uni-Lo RA across various natural language understanding tasks. Following [4, 6], we focus on six tasks from GLUE: SST-2 [18] (sentiment analysis), MRPC [19] (paraphrase detection), Co LA [20] (linguistic acceptability), QNLI [21] (inference), RTE [22] (inference) and STS-B [23] (semantic textual similarity). ... To evaluate mathematical reasoning capabilities, we fine-tune the Mistral-7B-v0.1 [25] and Gemma7B [26] models on the Meta Math QA [27] dataset and test them on GSM8K [28] and MATH [29].
Hardware Specification	Yes	All our experiments are conducted on a server equipped with 8 NVIDIA A100 80GB GPUs. For reproducibility, we provide detailed hyperparameters and specifications of computing resources for each experiment in Appendix A.2.
Software Dependencies	No	Algorithm 1 provides the Py Torch-like pseudocode for Uni-Lo RA, which can be seamlessly integrated into the Py Torch framework.
Experiment Setup	Yes	For reproducibility, we provide detailed hyperparameters and specifications of computing resources for each experiment in Appendix A.2. ... Table 8: Hyperparameters and computing resources used in the natural language understanding experiments on the GLUE benchmark. h: hour, m: minute. Model Hyperparameter SST-2 MRPC Co LA QNLI RTE STS-B Ro BERTabase Optimizer Adam W Warmup Ratio 0.06 LR Schedule Linear Init. of θd U( 0.02, 0.02) # GPUs 1 Epochs 60 30 80 25 160 80 Learning Rate (Head) 1E-4 2E-2 5E-3 2E-4 5E-4 2E-4 Learning Rate (θd) 5E-3 \|θd\| 23,040 Rank 4 Max Seq. Len. 512 Batch Size Per GPU 32 Training Time 9.2h 15.5m 1.8h 6.2h 1h 1.1h GPU Memory 24,310 Mi B