reproducibilityindex.ai

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Authors: Arjun Subramonian, Kai-Wei Chang, Yizhou Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the fairness and accuracy of our solution on synthetic and real-world credit networks, finding that it improves fairness without a significant loss in reconstruction error on the synthetic datasets but doesn t improve fairness on the real-world datasets.
Researcher Affiliation	Academia	Arjun Subramonian UCLA arjunsub@cs.ucla.edu Kai-Wei Chang UCLA kwchang@cs.ucla.edu Yizhou Sun UCLA yzsun@cs.ucla.edu
Pseudocode	Yes	We now present a theorem that shows how to perform mean aggregation feature imputation with a discrimination risk of at most ϵ when the known feature values remain fixed. Theorem 3 (ϵ-Fair Imputation, β = 0) Vanilla mean aggregation feature imputation updates X(t+1) U := X(t) U γ( UUX(t) U + UKXK) = Z(t) U , where γ = 1. Let ϵ-fair mean aggregation feature imputation instead update X(t+1) U := PW Z(t) U + PB, where: ( I\|U\|, RK ϵ c T Z(t) U RK + ϵ I\|U\| cc T c T c, otherwise , PB = cc T RK ϵ, c T Z(t) U < RK ϵ RK + ϵ, c T Z(t) U > RK + ϵ 0, otherwise c R\|U\|, c T Z(t) U = 1 \|Q\| q Q U Z(t) q 1 r R U Z(t) r
Open Source Code	Yes	All our code may be found in the supplementary material.
Open Datasets	Yes	We construct undirected two-block synthetic networks (SBM) using Stochastic Block Model Dataset from Py Torch Geometric [83] (where one block corresponds to the marginalized group Q and the other block to the dominant group R) with various (relative) group sizes and interand intra-link rates (more information in Section B.1). SBM does not have a corresponding task, i.e., the nodes do not have labels. We also use the real-world Credit defaulter and German credit networks from [29] (there exist limited natively graph real-world datasets with sensitive attributes available).
Dataset Splits	No	The paper states that models are trained, validated, and tested, and that 'new splits are created' for each run, but it does not provide specific percentages or counts for these splits.
Hardware Specification	Yes	All experiments are run on a Linux workstation with an Intel i7-11700F CPU and NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions using PyTorch and PyTorch Geometric, but does not specify their version numbers.
Experiment Setup	Yes	All models are trained for 200 epochs using the Adam optimizer [98] with a learning rate of 0.001 and a weight decay of 0.0005. Dropout with a rate of 0.5 [99] is applied to the two-layer MLP and GCN.