Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fair Normalizing Flows
Authors: Mislav Balunovic, Anian Ruoss, Martin Vechev
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate the effectiveness of FNF in enforcing various group fairness notions, as well as other attractive properties such as interpretability and transfer learning, on a variety of challenging real-world datasets. |
| Researcher Affiliation | Collaboration | Mislav Balunovi c ETH Zurich EMAIL Anian Ruoss ETH Zurich, Deep Mind EMAIL Martin Vechev ETH Zurich EMAIL |
| Pseudocode | Yes | Algorithm 1 Learning Fair Normalizing Flows |
| Open Source Code | Yes | We make all of our code publicly available at https://github.com/eth-sri/fnf. |
| Open Datasets | Yes | We consider UCI Adult and Crime (Dua & Graff, 2017), Compas (Angwin et al., 2016), Law School (Wightman, 2017), and the Health Heritage dataset. |
| Dataset Splits | Yes | For each dataset, we ο¬rst split the data into training and test set, using the original splits wherever possible and a 80% / 20% split of the original dataset otherwise. We then further sample 20% of the training set to be used as validation set. |
| Hardware Specification | Yes | We run all experiments on a desktop PC using a single Ge Force RTX 2080 Ti GPU and 16-core Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz. |
| Software Dependencies | Yes | Our code is implemented in Py Torch (Paszke et al., 2019). |
| Experiment Setup | Yes | Crime and Law use batch size 128, initial learning rate 0.01 and weight decay 0.0001, while Health uses batch size 256, initial learning rate 0.001 and weight decay 0. Training is performed using Adam (Kingma & Ba, 2015) optimizer. We use 60, 100, and 80 epochs for Crime, Law and Health, respectively. |