Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Certifying Counterfactual Bias in LLMs
Authors: Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, Gagandeep Singh
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS We used 2 A100 GPUs, each with 40GB VRAM. We derive the queries on which the speciļ¬cations from the 3 preļ¬x distributions presented in Section 4 are pivoted, from popular datasets for fairness and bias assessment BOLD (Dhamala et al., 2021) and Decoding Trust (Wang et al., 2024). |
| Researcher Affiliation | Collaboration | 1 UIUC, 2 Amazon, 3 Oracle Health |
| Pseudocode | Yes | Algorithm 1 Preļ¬x speciļ¬cation Input: L, Q; Output: C( , D, L) ... Algorithm 2 Make random preļ¬x ... Algorithm 3 Make mixture of jailbreak preļ¬x ... Algorithm 4 Make soft preļ¬x |
| Open Source Code | Yes | Our implementation is available at https://github.com/uiuc-focal-lab/LLMCert-B and we provide guidelines for using our framework for practitioners in Appendix A. |
| Open Datasets | Yes | We derive the queries on which the speciļ¬cations from the 3 preļ¬x distributions presented in Section 4 are pivoted, from popular datasets for fairness and bias assessment BOLD (Dhamala et al., 2021) and Decoding Trust (Wang et al., 2024). |
| Dataset Splits | Yes | BOLD setup. BOLD is a dataset of partial sentences to demonstrate bias in the generations of LLMs in common situations. We pick a test set of 250 samples randomly from BOLD s profession partition and demonstrate binary gender bias speciļ¬cations and certiļ¬cates on it. ... Decoding Trust setup. ... We make speciļ¬cations from all 48 statements in the stereotypes partition for demographic groups corresponding to race (black/white). |
| Hardware Specification | Yes | We used 2 A100 GPUs, each with 40GB VRAM. |
| Software Dependencies | No | The paper mentions open-sourcing its implementation and refers to general LLMs like GPT-4, Llama-2-chat, etc., but does not provide specific version numbers for software components like Python, PyTorch, or CUDA used in their own implementation of LLMCert-B. |
| Experiment Setup | Yes | The values of the certiļ¬cation parameters used in our experiments are given in Table 2 (Appendix E). We study their effect on the certiļ¬cation results with an ablation study in Appendix E. We generate the certiļ¬cation bounds with 95% conļ¬dence and 50 samples. |