Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Fair Graph Representations via Automated Data Augmentations
Authors: Hongyi Ling, Zhimeng Jiang, Youzhi Luo, Shuiwang Ji, Na Zou
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our Graphair consistently outperforms many baselines on multiple node classification datasets in terms of fairness-accuracy trade-off performance. |
| Researcher Affiliation | Academia | Hongyi Ling, Zhimeng Jiang, Youzhi Luo, Shuiwang Ji , Na Zou Texas A&M University College Station, TX 77843, USA EMAIL |
| Pseudocode | Yes | We summarize the training algorithm for Graphair and provide the pseudo codes in Algorithm 1. |
| Open Source Code | Yes | Our code is publicly available as part of the DIG package (https://github.com/divelab/DIG). |
| Open Datasets | No | The paper mentions 'NBA is extended from a Kaggle dataset' and 'Pokec-z and Pokec-n are sampled from a larger social network Pokec' but does not provide specific links, DOIs, repository names, or formal citations with authors and year for direct public access to these datasets. |
| Dataset Splits | Yes | we randomly split 10%/10%/80% for training, validating and testing the classifier. ... We randomly split 20%/35%/45% for training, validating and testing the classifier. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions using GCN models and Adam optimizer but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | For Graphair, we adopt two-layer GCN models as the adversary model k and augmentation encoder genc, and a three-layer GCN model as the representation encoder f. We use 64 as the hidden dimension in all three models. For the augmentation model, we use an MLP model with 2 layers, the hidden size of 64, and Re LU as the non-linear activation function for MLPA and MLPX. The hyperparameter β is set to 1, and the hyperparameters α, γ and λ are determined with a grid search among {0.1, 1, 10}. ... We train the models for 500 epochs using Adam optimizer with 1 × 10−4 learning rate and 1 × 10−5 weight decay. |