Fairness and Accuracy under Domain Generalization

Authors: Thai-Hoang Pham, Xueru Zhang, Ping Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on real-world data validate the proposed algorithm.
Researcher Affiliation Academia Thai-Hoang Pham, Xueru Zhang, Ping Zhang The Ohio State University, Columbus, OH 43210, USA {pham.375,zhang.12807,zhang.10631}@osu.edu
Pseudocode Yes Algorithm 1: Fairness and Accuracy Transfer by Density Matching (FATDM)
Open Source Code Yes Model implementation is available at https://github.com/pth1993/FATDM.
Open Datasets Yes The original chest X-ray images and the corresponding metadata can be downloaded from PhysioNet (https://physionet.org/content/mimic-cxr-jpg/2.0.0/; https: //physionet.org/content/mimiciv/2.0/).
Dataset Splits Yes We follow leave-one-out domain setting in which 3 domains are used for training and the remaining domain serves as the unseen target domain and is used for evaluation. ... 10% of training data is used for validation. Each model is trained with 10 epoches and the results are from the epoch with best performance on the validation set.
Hardware Specification Yes Models (FATDM and baselines) are implemented by PyTorch library version 1.11 and is trained on multiple computer nodes (each model instance is trained on a single node which has 4 CPUs, 8GB of memory, and a single GPU (P100 or V100)).
Software Dependencies Yes Models (FATDM and baselines) are implemented by PyTorch library version 1.11
Experiment Setup Yes ω (hyper-parameter that controls accuracy-fairness trade-off) varies from 0 to 10 with step sizes 0.0002 for ω [0, 0.002], 0.002 for ω [0.002, 0.1] and 0.2 for ω [0.2, 10], and γ (hyper-parameter that controls accuracy-invariance trade-off) is set to 0.1 (after hyper-parameter tuning). Models (FATDM and baselines) are implemented by PyTorch library version 1.11 and is trained on multiple computer nodes (each model instance is trained on a single node which has 4 CPUs, 8GB of memory, and a single GPU (P100 or V100)). One domain s data is used for testing and the other domains data is used for training (10% of training data is used for validation). Each model is trained with 10 epoches and the results are from the epoch with best performance on the validation set.