On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Authors: Arjun Subramonian, Kai-Wei Chang, Yizhou Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate the fairness and accuracy of our solution on synthetic and real-world credit networks, finding that it improves fairness without a significant loss in reconstruction error on the synthetic datasets but doesn t improve fairness on the real-world datasets.
Researcher Affiliation Academia Arjun Subramonian UCLA arjunsub@cs.ucla.edu Kai-Wei Chang UCLA kwchang@cs.ucla.edu Yizhou Sun UCLA yzsun@cs.ucla.edu
Pseudocode Yes We now present a theorem that shows how to perform mean aggregation feature imputation with a discrimination risk of at most ϵ when the known feature values remain fixed. Theorem 3 (ϵ-Fair Imputation, β = 0) Vanilla mean aggregation feature imputation updates X(t+1) U := X(t) U γ( UUX(t) U + UKXK) = Z(t) U , where γ = 1. Let ϵ-fair mean aggregation feature imputation instead update X(t+1) U := PW Z(t) U + PB, where: ( I|U|, RK ϵ c T Z(t) U RK + ϵ I|U| cc T c T c, otherwise , PB = cc T RK ϵ, c T Z(t) U < RK ϵ RK + ϵ, c T Z(t) U > RK + ϵ 0, otherwise c R|U|, c T Z(t) U = 1 |Q| q Q U Z(t) q 1 r R U Z(t) r
Open Source Code Yes All our code may be found in the supplementary material.
Open Datasets Yes We construct undirected two-block synthetic networks (SBM) using Stochastic Block Model Dataset from Py Torch Geometric [83] (where one block corresponds to the marginalized group Q and the other block to the dominant group R) with various (relative) group sizes and interand intra-link rates (more information in Section B.1). SBM does not have a corresponding task, i.e., the nodes do not have labels. We also use the real-world Credit defaulter and German credit networks from [29] (there exist limited natively graph real-world datasets with sensitive attributes available).
Dataset Splits No The paper states that models are trained, validated, and tested, and that 'new splits are created' for each run, but it does not provide specific percentages or counts for these splits.
Hardware Specification Yes All experiments are run on a Linux workstation with an Intel i7-11700F CPU and NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using PyTorch and PyTorch Geometric, but does not specify their version numbers.
Experiment Setup Yes All models are trained for 200 epochs using the Adam optimizer [98] with a learning rate of 0.001 and a weight decay of 0.0005. Dropout with a rate of 0.5 [99] is applied to the two-layer MLP and GCN.