Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Certifying Counterfactual Bias in LLMs
Authors: Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, Gagandeep Singh
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS We used 2 A100 GPUs, each with 40GB VRAM. We derive the queries on which the speciļ¬cations from the 3 preļ¬x distributions presented in Section 4 are pivoted, from popular datasets for fairness and bias assessment BOLD (Dhamala et al., 2021) and Decoding Trust (Wang et al., 2024). |
| Researcher Affiliation | Collaboration | 1 UIUC, 2 Amazon, 3 Oracle Health |
| Pseudocode | Yes | Algorithm 1 Preļ¬x speciļ¬cation Input: L, Q; Output: C( , D, L) ... Algorithm 2 Make random preļ¬x ... Algorithm 3 Make mixture of jailbreak preļ¬x ... Algorithm 4 Make soft preļ¬x |
| Open Source Code | Yes | Our implementation is available at https://github.com/uiuc-focal-lab/LLMCert-B and we provide guidelines for using our framework for practitioners in Appendix A. |
| Open Datasets | Yes | We derive the queries on which the speciļ¬cations from the 3 preļ¬x distributions presented in Section 4 are pivoted, from popular datasets for fairness and bias assessment BOLD (Dhamala et al., 2021) and Decoding Trust (Wang et al., 2024). |
| Dataset Splits | Yes | BOLD setup. BOLD is a dataset of partial sentences to demonstrate bias in the generations of LLMs in common situations. We pick a test set of 250 samples randomly from BOLD s profession partition and demonstrate binary gender bias speciļ¬cations and certiļ¬cates on it. ... Decoding Trust setup. ... We make speciļ¬cations from all 48 statements in the stereotypes partition for demographic groups corresponding to race (black/white). |
| Hardware Specification | Yes | We used 2 A100 GPUs, each with 40GB VRAM. |
| Software Dependencies | No | The paper mentions open-sourcing its implementation and refers to general LLMs like GPT-4, Llama-2-chat, etc., but does not provide specific version numbers for software components like Python, PyTorch, or CUDA used in their own implementation of LLMCert-B. |
| Experiment Setup | Yes | The values of the certiļ¬cation parameters used in our experiments are given in Table 2 (Appendix E). We study their effect on the certiļ¬cation results with an ablation study in Appendix E. We generate the certiļ¬cation bounds with 95% conļ¬dence and 50 samples. |