Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective
Authors: Zubair Bashir, Bhavik Chandna, Procheta Sen
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For demographic and gender bias-related experiments, we used two different types of datasets. The Demographic Bias dataset used in our experiment is from an existing study in Narayanan Venkit et al. (2023). [...] Table 3 shows the performance on two NLP tasks: Co NLL-2003 Sang & De Meulder (2003), a named entity recognition benchmark, and Co LA Warstadt (2019), a linguistic acceptability judgment task for all the models. |
| Researcher Affiliation | Academia | Zubair Bashir EMAIL Indian Institute of Technology, Kharagpur; Bhavik Chandna EMAIL University of California San Diego; Procheta Sen EMAIL University of Liverpool |
| Pseudocode | No | The paper describes methodologies and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks. Descriptions are provided in prose. |
| Open Source Code | Yes | Our code is available at https://github.com/zubair2004/MI_Bias. |
| Open Datasets | Yes | The Demographic Bias dataset used in our experiment is from an existing study in Narayanan Venkit et al. (2023). [...] To understand Gender Bias in models, we used the set of 320 professions chosen and annotated from Bolukbasi et al. (2016b). |
| Dataset Splits | Yes | Each model was fine-tuned for 20 epochs on the respective datasets, with early stopping based on validation loss. Evaluation was conducted on held-out validation splits, and circuit changes were analyzed using attention weight inspection and edge attribution methods within Transformer Lens. |
| Hardware Specification | Yes | We conducted all the experiments in a computing machine having two A100 GPUs. |
| Software Dependencies | No | The paper mentions "Hooked-Transformer from Transformer Lens repository" and "Distilbert-base-uncased model" but does not provide specific version numbers for these or any other key software dependencies. |
| Experiment Setup | Yes | Fine-tuning was performed with the Adam W optimizer using a learning rate of 10 4 and a batch size of 129. Each model was fine-tuned for 20 epochs on the respective datasets, with early stopping based on validation loss. |