Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Local Causal Discovery for Structural Evidence of Direct Discrimination
Authors: Jacqueline Maasch, Kyra Gan, Violet Chen, Agni Orfanoudaki, Nil-Jana Akpinar, Fei Wang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use LD3 to analyze causal fairness in two complex decision systems: criminal recidivism prediction and liver transplant allocation. LD3 was more time-efficient and returned more plausible results on real-world data than baselines, which took 46 to 5870 longer to execute. |
| Researcher Affiliation | Collaboration | Jacqueline Maasch1, Kyra Gan1, Violet Chen2, Agni Orfanoudaki3, Nil-Jana Akpinar4*, Fei Wang5 1Cornell Tech 2Stevens Institute of Technology 3University of Oxford 4Amazon AWS AI/ML (*Work done outside Amazon) 5Weill Cornell Medicine |
| Pseudocode | Yes | Algorithm 1: LD3 Input: Exposure X, outcome Y , variable set Z, CI test of choice, significance level α. Output: Adjustment set ADE, SDC results. Assumptions: Sufficient conditions A1 and A2. |
| Open Source Code | Yes | 1Code on Git Hub: https://github.com/jmaasch/LD3 |
| Open Datasets | Yes | We assessed the ability of LD3 to facilitate CFA on the Pro Publica COMPAS dataset. (...) All baselines were assessed on the SANGIOVESE benchmark from the bnlearn repository (Scutari 2010) (...). We use the National Standard Transplant Analysis and Research (STAR) dataset (OPTN 2024) for adult patients during 2017-2019 (n = 21, 101) and 2020-2022 (n = 22, 807). |
| Dataset Splits | Yes | Ten replicate datasets were sampled at n = [250, 500, 1000]. (...) Estimators used random forest classifiers with a 70% / 30% train-test split. |
| Hardware Specification | Yes | All experiments used an Apple Mac Book (M2 Pro Chip). |
| Software Dependencies | No | The paper mentions software components like "double machine learning" (with a citation to Chernozhukov et al. 2018 and a URL to econml) and the "bnlearn R Package" (Scutari 2010), but it does not provide specific version numbers for these or other software libraries. |
| Experiment Setup | Yes | All constraint-based methods used Fisher-z tests (α = 0.01). (...) Causal discovery used χ2 CI tests and WCDE estimation used double machine learning (...) Estimators used random forest classifiers with a 70% / 30% train-test split. (...) We used three significance levels for independence testing (α = 0.005, 0.01, 0.05) to assess stability of results |