Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fairness and robustness in anti-causal prediction
Authors: Maggie Makar, Alexander D'Amour
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using a medical dataset, we empirically validate our findings on the task of detecting pneumonia from X-rays, in a setting where differences in prevalence across sex groups motivates a fairness mitigation. Our findings highlight the importance of considering causal structure when choosing and enforcing fairness criteria. |
| Researcher Affiliation | Collaboration | Maggie Makar EMAIL Computer Science and Engineering University of Michigan Ann Arbor, MI Alexander D Amour EMAIL Google Research Cambridge, MA |
| Pseudocode | No | The paper describes mathematical formulations and strategies for learning objectives (Equations 3, 4, 5) but does not present them in a structured pseudocode or algorithm block format. The steps are described in paragraph text. |
| Open Source Code | Yes | We used publicly available code provided by the authors in Makar et al. (2021)4. https://github.com/mymakar/causally_motivated_shortcut_removal |
| Open Datasets | Yes | We conduct this analysis on a publicly available dataset, Che Xpert (Irvin et al., 2019). |
| Dataset Splits | Yes | We split the dataset into 70% examples used for training and validation, while the rest is held out for testing. Specifically, we split the training and validation data into 75% for training and 25% for validation. We further split the validation data into 5 folds . |
| Hardware Specification | No | The paper mentions using Dens Net-121, pretrained on Image Net, fine-tuned for the task, and implemented in Tensor Flow. However, it does not provide any specific details about the hardware (e.g., GPU model, CPU type) used for running the experiments. |
| Software Dependencies | No | All models are implemented in Tensor Flow (Abadi et al., 2015). The paper does not provide specific version numbers for TensorFlow or any other software libraries used. |
| Experiment Setup | Yes | We train all models using a batch size of 64, and image sizes 256 256 for 50 epochs. For the three MMD based models, we need to pick the free parameter α which controls how strictly we enforce the MMD penalty, and γ, which is the kernel bandwidth needed to compute the MMD. We follow the cross-validation procedure outlined in Makar et al. (2021). Specifically, we split the training and validation data into 75% for training and 25% for validation. We further split the validation data into 5 folds . We compute the MMD on each of the folds. For the DNN, we perform L2-regularization. We pick the regularization parameter based on the validation loss. |