Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

Authors: Yifan Wang, Sukrut Rao, Ji-Ung Lee, Mayank Jobanputra, Vera Demberg

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Automatic and human evaluation results demonstrate that B-cos LMs produce more faithful and human interpretable explanations than post-hoc methods, while maintaining task performance comparable to conventional fine-tuning. Our in-depth analysis explores how B-cos LMs differ from conventionally fine-tuned models in their learning processes and explanation patterns. Finally, we present a first exploration of transforming decoder-only models to B-cos LMs for generation tasks.
Researcher Affiliation	Academia	Yifan Wang EMAIL Saarland University, Saarbrücken, Germany Sukrut Rao EMAIL Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany Ji-Ung Lee EMAIL Saarland University, Saarbrücken, Germany Mayank Jobanputra EMAIL Saarland University, Saarbrücken, Germany Vera Demberg EMAIL Saarland University, Saarbrücken, Germany Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Pseudocode	No	The paper describes methodologies in text and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/Ewanwong/bcos_lm.
Open Datasets	Yes	Our experiments use three datasets: AG News (topic classification, Zhang et al., 2015), IMDB (sentiment analysis, Maas et al., 2011), and Hate Xplain (hate speech detection, Mathew et al., 2021). ... we use the BLi MP dataset (Warstadt et al., 2020) to assess explanations for linguistic phenomena, and the Indirect Object Identification (IOI) dataset (Brian Muhia, 2022) to test models reasoning about object identification.
Dataset Splits	Yes	For validation, we randomly sample half of the test set from IMDB and AG News. ... For faithfulness evaluation, we conduct perturbation-based evaluations on 2000 test examples and Seq PG on 500 test examples for AG News and IMDB. For Hate Xplain, we use the full test set for perturbation-based evaluation (1,924 examples) and construct 269, 310, and 308 Seq PG examples from it using BERT, Distil BERT, and Ro BERTa, respectively.
Hardware Specification	Yes	Unless stated otherwise, all experiments are conducted on a single NVIDIA H100 GPU.
Software Dependencies	No	For all PLMs used in the experiments, we use the uncased base version from huggingface (Wolf et al., 2020). For Ix G and Shap Sampl, we use the Captum (Kokhlikyan et al., 2020) implementations. We implement the Attention method ourselves, and LIME is sourced from the lit library. For Decomp X and SIG, we use their official implementations with default configurations. ... For Saloss models, we use the official codebase with default hyperparameters to train BERT and Ro BERTa on AG News, IMDB, and Hate Xplain.
Experiment Setup	Yes	For both conventional models and B-cos LMs, we train them for 5 epochs with 10% linear warm-up steps on the downstream task datasets. The learning rates are set to 2e-5 for IMDB and Hate Xplain, and 3e-5 for AG News. All models use a batch size of 16 and a maximum sequence length of 512. For validation, we randomly sample half of the test set from IMDB and AG News. We set B=1.25 for IMDB and B=1.5 for AG News and Hate Xplain datasets.