Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
An Information-Theoretic Quantification of Discrimination with Exempt Features
Authors: Sanghamitra Dutta, Praveen Venkatesh, Piotr Mardziel, Anupam Datta, Pulkit Grover3825-3833
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then perform a case study using one observational measure to show how one might train a model allowing for exemption of discrimination due to critical features. Case Study: The goal is to decide whether to show ads for an editorial job requiring English proficiency, based on whether a score generated from internet activity is above a threshold. We train a classifier of the form ˆY = 1/(1 + e (w T X+b)) (logistic regression). |
| Researcher Affiliation | Academia | Sanghamitra Dutta, Praveen Venkatesh, Piotr Mardziel, Anupam Datta, Pulkit Grover Carnegie Mellon University EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper describes a synthetic dataset for the case study, e.g., 'Z Bern(1/2) is a protected attribute...', 'U1, U2, U3 i.i.d. N(0, 1)'. However, it does not provide concrete access information (link, DOI, citation, or repository) for this dataset. |
| Dataset Splits | No | The paper mentions training a classifier and performing '100 simulations of 7000 iterations each with batch size 200' but does not specify explicit training, validation, or test dataset splits or their sizes. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers. |
| Experiment Setup | Yes | The paper specifies training details such as 'We train a classifier of the form ˆY = 1/(1 + e (w T X+b)) (logistic regression)... We train using the following loss functions: Loss L1: minw,b LCross Entropy(Y, ˆY ). Loss L2: minw,b LCross Entropy(Y, ˆY )+λ I(Z; ˆY ), ... Loss L3: minw,b LCross Entropy(Y, ˆY |Xc)... (100 simulations of 7000 iterations each with batch size 200).' |