Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
Authors: Haizhou Shi, Yibin Wang, Ligong Han, Huan Zhang, Hao Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments, we show that TFB achieves superior uncertainty estimation and generalization compared to existing methods while eliminating the need for complex Bayesianization training procedures. |
| Researcher Affiliation | Collaboration | 1Rutgers University. 2University of Illinois Urbana-Champaign (UIUC). 3Red Hat AI Innovation. Correspondence to: Haizhou Shi <EMAIL>, Hao Wang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Training-Free Bayesianization (TFB) |
| Open Source Code | Yes | Code is available at https://github.com/Wang-ML-Lab/bayesian-peft. |
| Open Datasets | Yes | For in-distribution experiments, we evaluate model performance on six commonsense reasoning tasks: Winogrande-Small (WG-S) and Winogrande-Medium (WG-M) [50], ARC-Challenge (ARC-C) and ARC-Easy (ARC-E) [11], Open Book Question Answering (OBQA) [42], and Bool Q [10]. |
| Dataset Splits | Yes | Table 7: Dataset Statistics. The size of the Anchor Set D is used in Table 1, 3 and 14. Size of Training Set 640 1,119 2,251 2,258 4,957 9,427 20,652 Size of Anchor Set D 500 (78%) 500 (45%) 500 (22%) 500 (22%) 500 (10%) 500 (5%) 500 (2%) Size of Test Set 1,267 299 570 1,267 500 3,270 7,173 |
| Hardware Specification | Yes | Table 2: A comparison of running time and maximum GPU memory cost between TFB and BLo B during the process of Bayesianizatioin. The experiments are conducted on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions software components like "Adam W optimizer", "Lo RA configuration", "Pi SSA", and "Ve RA" but does not provide specific version numbers for any key libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | Shared Configuration. We report the mean and standard deviation of all experimental results calculated over three random seeds. For all training processes in our experiments, we employ the Adam W optimizer. The learning rate follows a linear decay schedule with a warmup ratio of 0.06 and a maximum value of 2e 4. The batch size is set to 4, and the maximum sentence length is limited to 300 tokens. The Lo RA configuration includes Lo RA α = 16 and Lo RA r = 8. |