Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fairness and Accuracy under Domain Generalization
Authors: Thai-Hoang Pham, Xueru Zhang, Ping Zhang
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real-world data validate the proposed algorithm. |
| Researcher Affiliation | Academia | Thai-Hoang Pham, Xueru Zhang, Ping Zhang The Ohio State University, Columbus, OH 43210, USA EMAIL |
| Pseudocode | Yes | Algorithm 1: Fairness and Accuracy Transfer by Density Matching (FATDM) |
| Open Source Code | Yes | Model implementation is available at https://github.com/pth1993/FATDM. |
| Open Datasets | Yes | The original chest X-ray images and the corresponding metadata can be downloaded from PhysioNet (https://physionet.org/content/mimic-cxr-jpg/2.0.0/; https: //physionet.org/content/mimiciv/2.0/). |
| Dataset Splits | Yes | We follow leave-one-out domain setting in which 3 domains are used for training and the remaining domain serves as the unseen target domain and is used for evaluation. ... 10% of training data is used for validation. Each model is trained with 10 epoches and the results are from the epoch with best performance on the validation set. |
| Hardware Specification | Yes | Models (FATDM and baselines) are implemented by PyTorch library version 1.11 and is trained on multiple computer nodes (each model instance is trained on a single node which has 4 CPUs, 8GB of memory, and a single GPU (P100 or V100)). |
| Software Dependencies | Yes | Models (FATDM and baselines) are implemented by PyTorch library version 1.11 |
| Experiment Setup | Yes | ω (hyper-parameter that controls accuracy-fairness trade-off) varies from 0 to 10 with step sizes 0.0002 for ω [0, 0.002], 0.002 for ω [0.002, 0.1] and 0.2 for ω [0.2, 10], and γ (hyper-parameter that controls accuracy-invariance trade-off) is set to 0.1 (after hyper-parameter tuning). Models (FATDM and baselines) are implemented by PyTorch library version 1.11 and is trained on multiple computer nodes (each model instance is trained on a single node which has 4 CPUs, 8GB of memory, and a single GPU (P100 or V100)). One domain s data is used for testing and the other domains data is used for training (10% of training data is used for validation). Each model is trained with 10 epoches and the results are from the epoch with best performance on the validation set. |