Why Is My Classifier Discriminatory?

Authors: Irene Chen, Fredrik D. Johansson, David Sontag

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we perform case-studies on prediction of income, mortality, and review ratings, confirming the value of this analysis. We find that data collection is often a means to reduce discrimination without sacrificing accuracy.
Researcher Affiliation Academia Irene Y. Chen MIT iychen@mit.edu Fredrik D. Johansson MIT fredrikj@mit.edu David Sontag MIT dsontag@csail.mit.edu
Pseudocode No The paper describes procedures and methods verbally but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets Yes The Adult dataset in the UCI Machine Learning Repository (Lichman, 2013)... Using the MIMIC-III dataset of all clinical notes from 25,879 adult patients from Beth Israel Deaconess Medical Center (Johnson et al., 2016)... Goodreads book reviews (Gnanesh, 2017).
Dataset Splits Yes Using an 80/20 train-test split... Training a model on 50% of the data, selecting hyper-parameters on 25%, and testing on 25%...
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper mentions machine learning models (e.g., random forest, logistic regression, Latent Dirichlet Allocation) but does not provide specific software library names with version numbers needed for replication.
Experiment Setup No The paper mentions hyperparameter tuning ('We tune hyperparameters for each training set size...') but does not provide specific hyperparameter values or detailed optimizer settings in the main text. It defers 'full training details in the supplementary material.'