Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fairness and robustness in anti-causal prediction

Authors: Maggie Makar, Alexander D'Amour

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using a medical dataset, we empirically validate our findings on the task of detecting pneumonia from X-rays, in a setting where differences in prevalence across sex groups motivates a fairness mitigation. Our findings highlight the importance of considering causal structure when choosing and enforcing fairness criteria.
Researcher Affiliation Collaboration Maggie Makar EMAIL Computer Science and Engineering University of Michigan Ann Arbor, MI Alexander D Amour EMAIL Google Research Cambridge, MA
Pseudocode No The paper describes mathematical formulations and strategies for learning objectives (Equations 3, 4, 5) but does not present them in a structured pseudocode or algorithm block format. The steps are described in paragraph text.
Open Source Code Yes We used publicly available code provided by the authors in Makar et al. (2021)4. https://github.com/mymakar/causally_motivated_shortcut_removal
Open Datasets Yes We conduct this analysis on a publicly available dataset, Che Xpert (Irvin et al., 2019).
Dataset Splits Yes We split the dataset into 70% examples used for training and validation, while the rest is held out for testing. Specifically, we split the training and validation data into 75% for training and 25% for validation. We further split the validation data into 5 folds .
Hardware Specification No The paper mentions using Dens Net-121, pretrained on Image Net, fine-tuned for the task, and implemented in Tensor Flow. However, it does not provide any specific details about the hardware (e.g., GPU model, CPU type) used for running the experiments.
Software Dependencies No All models are implemented in Tensor Flow (Abadi et al., 2015). The paper does not provide specific version numbers for TensorFlow or any other software libraries used.
Experiment Setup Yes We train all models using a batch size of 64, and image sizes 256 256 for 50 epochs. For the three MMD based models, we need to pick the free parameter α which controls how strictly we enforce the MMD penalty, and γ, which is the kernel bandwidth needed to compute the MMD. We follow the cross-validation procedure outlined in Makar et al. (2021). Specifically, we split the training and validation data into 75% for training and 25% for validation. We further split the validation data into 5 folds . We compute the MMD on each of the folds. For the DNN, we perform L2-regularization. We pick the regularization parameter based on the validation loss.