Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

Authors: Haizhou Shi, Yibin Wang, Ligong Han, Huan Zhang, Hao Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive experiments, we show that TFB achieves superior uncertainty estimation and generalization compared to existing methods while eliminating the need for complex Bayesianization training procedures.
Researcher Affiliation	Collaboration	1Rutgers University. 2University of Illinois Urbana-Champaign (UIUC). 3Red Hat AI Innovation. Correspondence to: Haizhou Shi <EMAIL>, Hao Wang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Training-Free Bayesianization (TFB)
Open Source Code	Yes	Code is available at https://github.com/Wang-ML-Lab/bayesian-peft.
Open Datasets	Yes	For in-distribution experiments, we evaluate model performance on six commonsense reasoning tasks: Winogrande-Small (WG-S) and Winogrande-Medium (WG-M) [50], ARC-Challenge (ARC-C) and ARC-Easy (ARC-E) [11], Open Book Question Answering (OBQA) [42], and Bool Q [10].
Dataset Splits	Yes	Table 7: Dataset Statistics. The size of the Anchor Set D is used in Table 1, 3 and 14. Size of Training Set 640 1,119 2,251 2,258 4,957 9,427 20,652 Size of Anchor Set D 500 (78%) 500 (45%) 500 (22%) 500 (22%) 500 (10%) 500 (5%) 500 (2%) Size of Test Set 1,267 299 570 1,267 500 3,270 7,173
Hardware Specification	Yes	Table 2: A comparison of running time and maximum GPU memory cost between TFB and BLo B during the process of Bayesianizatioin. The experiments are conducted on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions software components like "Adam W optimizer", "Lo RA configuration", "Pi SSA", and "Ve RA" but does not provide specific version numbers for any key libraries or frameworks used in the implementation.
Experiment Setup	Yes	Shared Configuration. We report the mean and standard deviation of all experimental results calculated over three random seeds. For all training processes in our experiments, we employ the Adam W optimizer. The learning rate follows a linear decay schedule with a warmup ratio of 0.06 and a maximum value of 2e 4. The batch size is set to 4, and the maximum sentence length is limited to 300 tokens. The Lo RA configuration includes Lo RA α = 16 and Lo RA r = 8.