DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks
Authors: Boris van Breugel, Trent Kyono, Jeroen Berrevoets, Mihaela van der Schaar
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we show that DECAF successfully removes undesired bias and in contrast to existing methods is capable of generating high-quality synthetic data. Experimentally, we show how DECAF is compatible with several fairness/discrimination deļ¬nitions used in literature while still maintaining high downstream utility of generated data. |
| Researcher Affiliation | Academia | Boris van Breugel University of Cambridge bv292@cam.ac.uk; Trent Kyono University of California, Los Angeles tmkyono@ucla.edu; Jeroen Berrevoets University of Cambridge jb2384@cam.ac.uk; Mihaela van der Schaar University of Cambridge University of California, Los Angeles The Alan Turing Institute mv472@cam.ac.uk |
| Pseudocode | No | The paper describes the method DECAF in detail in Section 5 and illustrates its architecture in Figure 2, but it does not include any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Py Torch Lightning source code at https://github.com/vanderschaarlab/DECAF. |
| Open Datasets | Yes | We experiment on the Adult dataset [40], with known bias between gender and income [10, 11]. We use the Credit Approval dataset from [40]. [40] Dheeru Dua and Casey Graff. UCI machine learning repository, 2020. URL http://archive.ics.uci.edu/ml. |
| Dataset Splits | No | The paper mentions training data and models but does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) in the main body or referenced appendices. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Py Torch Lightning' and 'Tetrad [42]' but does not provide specific version numbers for these or any other software dependencies. It only cites the year for Tetrad's release. |
| Experiment Setup | Yes | For the MLP, we use a single hidden layer of size 100 with ReLU activation and Adam optimizer (learning rate 0.001, betas (0.9, 0.999), epsilon 1e-08, weight decay 0). |