DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Authors: Boris van Breugel, Trent Kyono, Jeroen Berrevoets, Mihaela van der Schaar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that DECAF successfully removes undesired bias and in contrast to existing methods is capable of generating high-quality synthetic data. Experimentally, we show how DECAF is compatible with several fairness/discrimination definitions used in literature while still maintaining high downstream utility of generated data.
Researcher Affiliation Academia Boris van Breugel University of Cambridge bv292@cam.ac.uk; Trent Kyono University of California, Los Angeles tmkyono@ucla.edu; Jeroen Berrevoets University of Cambridge jb2384@cam.ac.uk; Mihaela van der Schaar University of Cambridge University of California, Los Angeles The Alan Turing Institute mv472@cam.ac.uk
Pseudocode No The paper describes the method DECAF in detail in Section 5 and illustrates its architecture in Figure 2, but it does not include any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Py Torch Lightning source code at https://github.com/vanderschaarlab/DECAF.
Open Datasets Yes We experiment on the Adult dataset [40], with known bias between gender and income [10, 11]. We use the Credit Approval dataset from [40]. [40] Dheeru Dua and Casey Graff. UCI machine learning repository, 2020. URL http://archive.ics.uci.edu/ml.
Dataset Splits No The paper mentions training data and models but does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) in the main body or referenced appendices.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions using 'Py Torch Lightning' and 'Tetrad [42]' but does not provide specific version numbers for these or any other software dependencies. It only cites the year for Tetrad's release.
Experiment Setup Yes For the MLP, we use a single hidden layer of size 100 with ReLU activation and Adam optimizer (learning rate 0.001, betas (0.9, 0.999), epsilon 1e-08, weight decay 0).