EqGNN: Equalized Node Opportunity in Graphs

Authors: Uriel Singer, Kira Radinsky8333-8341

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our classifier over several graph datasets and sensitive attributes and show our algorithm reaches state-of-the-art results.
Researcher Affiliation Academia Technion, Israel Institute of Technology
Pseudocode Yes See the Appendixfor the full differential permutation loss algorithm.
Open Source Code Yes Git Hub repository with appendix, code, baselines, and data: https://github.com/urielsinger/Eq GNN
Open Datasets Yes Pokec (Takac and Zabovsky 2012). Pokec is a popular social network in Slovakia. An anonymized snapshot of the network was taken in 2012. User profiles include gender, age, hobbies, interest, education, etc. The original Pokec dataset contains millions of users. We sampled a subnetwork of the Zilinsky province. We create two datasets, where the sensitive attribute in one is the gender, and region in the other. The label used for classification is the job of the user. The job field was grouped in the following way: (1) education and student , (2) services & trade and construction , and (3) unemployed . NBA (Dai and Wang 2021) This dataset was presented in the Fair GNN baseline paper. The NBA Kaggle dataset contains around 400 basketball players with features including performance statistics, nationality, age, etc. This dataset was extended in (Dai and Wang 2021) to include the relationships of the NBA basketball players on Twitter. The binary sensitive attribute is whether a player is a U.S. player or an overseas player, while the task is to predict whether a salary of the player is over the median.
Dataset Splits Yes For all baselines, 50% of nodes are used for training, 25% for validation and 25% for testing.
Hardware Specification Yes All experiments used a single Nvidia P100 GPU with the average run of 5 minutes per seed for Pokec and 1 minute for NBA.
Software Dependencies No The paper mentions using the Adam optimizer, but it does not specify software dependencies like Python, PyTorch/TensorFlow versions, or other library versions.
Experiment Setup Yes For all baselines, 50% of nodes are used for training, 25% for validation and 25% for testing. The validation set is used for choosing the best model for each baseline throughout the training. As the classifier is the only part of the architecture used for testing, an early stopping was implemented after its validation loss (Eq. 7) hasn t improved for 50 epochs. The epoch with the best validation loss was then used for testing. All results are averaged over 20 different train/validation/test splits for Pokec datasets and 40 for the NBA dataset. For fair comparison, we implemented grid-search for all baselines over λ {0.01, 0.1, 1, 10} for baselines with a discriminator, and γ {0, 50} for baselines with a covariance expression. For both Pokec datasets and for all baselines λ = 1 and γ = 50, while for NBA we end up using λ = 0.1 and γ = 50 expect for Fair GNN with λ = 0.01.