Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fairness and robustness in anti-causal prediction

Authors: Maggie Makar, Alexander D'Amour

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using a medical dataset, we empirically validate our findings on the task of detecting pneumonia from X-rays, in a setting where differences in prevalence across sex groups motivates a fairness mitigation. Our findings highlight the importance of considering causal structure when choosing and enforcing fairness criteria.
Researcher Affiliation	Collaboration	Maggie Makar EMAIL Computer Science and Engineering University of Michigan Ann Arbor, MI Alexander D Amour EMAIL Google Research Cambridge, MA
Pseudocode	No	The paper describes mathematical formulations and strategies for learning objectives (Equations 3, 4, 5) but does not present them in a structured pseudocode or algorithm block format. The steps are described in paragraph text.
Open Source Code	Yes	We used publicly available code provided by the authors in Makar et al. (2021)4. https://github.com/mymakar/causally_motivated_shortcut_removal
Open Datasets	Yes	We conduct this analysis on a publicly available dataset, Che Xpert (Irvin et al., 2019).
Dataset Splits	Yes	We split the dataset into 70% examples used for training and validation, while the rest is held out for testing. Specifically, we split the training and validation data into 75% for training and 25% for validation. We further split the validation data into 5 folds .
Hardware Specification	No	The paper mentions using Dens Net-121, pretrained on Image Net, fine-tuned for the task, and implemented in Tensor Flow. However, it does not provide any specific details about the hardware (e.g., GPU model, CPU type) used for running the experiments.
Software Dependencies	No	All models are implemented in Tensor Flow (Abadi et al., 2015). The paper does not provide specific version numbers for TensorFlow or any other software libraries used.
Experiment Setup	Yes	We train all models using a batch size of 64, and image sizes 256 256 for 50 epochs. For the three MMD based models, we need to pick the free parameter α which controls how strictly we enforce the MMD penalty, and γ, which is the kernel bandwidth needed to compute the MMD. We follow the cross-validation procedure outlined in Makar et al. (2021). Specifically, we split the training and validation data into 75% for training and 25% for validation. We further split the validation data into 5 folds . We compute the MMD on each of the folds. For the DNN, we perform L2-regularization. We pick the regularization parameter based on the validation loss.